Development and Validation of a Novel Nomogram for Preoperative Prediction of In-Hospital Mortality After Coronary Artery Bypass Grafting Surgery in Heart Failure With Reduced Ejection Fraction

Background and Aims: Patients with heart failure with reduced ejection fraction (HFrEF) are among the most challenging patients undergoing coronary artery bypass grafting surgery (CABG). Several surgical risk scores are commonly used to predict the risk in patients undergoing CABG. However, these risk scores do not specifically target HFrEF patients. We aim to develop and validate a new nomogram score to predict the risk of in-hospital mortality among HFrEF patients after CABG. Methods: The study retrospectively enrolled 489 patients who had HFrEF and underwent CABG. The outcome was postoperative in-hospital death. About 70% (n = 342) of the patients were randomly constituted a training cohort and the rest (n = 147) made a validation cohort. A multivariable logistic regression model was derived from the training cohort and presented as a nomogram to predict postoperative mortality in patients with HFrEF. The model performance was assessed in terms of discrimination and calibration. Besides, we compared the model with EuroSCORE-2 in terms of discrimination and calibration. Results: Postoperative death occurred in 26 (7.6%) out of 342 patients in the training cohort, and in 10 (6.8%) out of 147 patients in the validation cohort. Eight preoperative factors were associated with postoperative death, including age, critical state, recent myocardial infarction, stroke, left ventricular ejection fraction (LVEF) ≤35%, LV dilatation, increased serum creatinine, and combined surgery. The nomogram achieved good discrimination with C-indexes of 0.889 (95%CI, 0.839–0.938) and 0.899 (95%CI, 0.835–0.963) in predicting the risk of mortality after CABG in the training and validation cohorts, respectively, and showed well-fitted calibration curves in the patients whose predicted mortality probabilities were below 40%. Compared with EuroSCORE-2, the nomogram had significantly higher C-indexes in the training cohort (0.889 vs. 0.762, p = 0.005) as well as the validation cohort (0.899 vs. 0.816, p = 0.039). Besides, the nomogram had better calibration and reclassification than EuroSCORE-2 both in the training and validation cohort. The EuroSCORE-2 underestimated postoperative mortality risk, especially in high-risk patients. Conclusions: The nomogram provides an optimal preoperative estimation of mortality risk after CABG in patients with HFrEF and has the potential to facilitate identifying HFrEF patients at high risk of in-hospital mortality.


INTRODUCTION
The most common cause of heart failure (HF) is coronary artery disease (CAD), which accounts for about 60% of all causes of HF with reduced ejection fraction (HFrEF) (1,2). For patients with HF, severe left ventricular (LV) systolic dysfunction, and CAD suitable for myocardial revascularization, coronary artery bypass grafting (CABG) is recommended as the first revascularization strategy (3). Despite the recent advances in cardiovascular surgery, CABG among HFrEF patients is still associated with a higher risk of morbidity and mortality than other patients. Therefore, risk assessment is necessary at the time of surgery in patients with HFrEF undergoing CABG. Several risk scores have been developed to help clinicians and patients make informed decisions regarding the risks of surgery. Examples include the Society of Thoracic Surgeons (STS) (4,5), the EuroSCORE (6), the EuroSCORE-2 (7), and the SinoSCORE risk scores (8). Although helpful, these scores were based on general cardiac surgery patients rather than patients with HFrEF. Additionally, in addition to being outdated and collected more than 10 years ago such scores were developed on western patients, they might be less generalizable to the Chinese patients.
Due to the lack of a specific and practical risk score for HFrEF patients, developing a predictive model that incorporates factors associated with mortality based on preoperative variables is needed. Therefore, this study aims to develop and validate a nomogram score to predict the risk of in-hospital mortality among HFrEF patients with CABG and compare the nomogram score's predictive value with the EuroSCORE-2.

Study Population
We recruited retrospectively consecutive patients who had undergone CABG in state of HFrEF between January 2013 and July 2019 at Beijing Anzhen Hospital, Capital Medical University. And the HFrEF is commonly defined as a reduction in LVEF to ≤40%, with symptoms and/or signs of heart failure (1,2). The inclusion criteria included the following: (1) LVEF ≤40% assessed by the last preoperative echocardiography (closest to the time prior to surgery); (2) Symptomatic HF (New York Heart Association [NYHA] functional class II-IV) and; (3) Underwent elective CABG, with or without mitral valve surgery due to ischemic mitral regurgitation. The exclusion criteria included the following: (1) Emergency surgery; (2) Systolic arterial blood pressure <90 mmHg when supine, sitting, or standing; (3) Hemodynamically significant stenotic valvular heart disease; (4) Non-ischemic mitral valve regurgitation caused by papillary muscle rupture, rheumatism, degeneration, infective endocarditis, and congenital heart disease and other organic diseases; (5) Complicated with aortic valve disease, primary myocardiopathy, congenital heart disease, rheumatic heart disease, macrovascular disease or other non-ischemic myocardial diseases; and (6) Cardiogenic shock. Ethical approval was obtained from the Institutional Ethics Committee of Beijing Anzhen Hospital.

Surgical Procedures
All patients underwent CABG through a midline sternotomy. The left internal mammary artery was the first choice for graft the left anterior descending artery. Saphenous veins and radial arteries were harvested with an open technique, and sequential or separate aortocoronary bypass grafting was performed in the remaining coronary arteries. A transit-time flow probe was used to assess the quality of anastomosis after grafting in all patients. The surgical procedure was jointly decided by more than two experienced surgeons after discussion for patients with mitral regurgitation or ventricular aneurysm. For isolated CABG, the choice of off-pump CABG, on-pump CABG, or Onpump beating heart CABG depended on the surgeon's habit and experience as well as intraoperative conditions.

Data Collection
Clinical characteristics, echocardiographic findings, laboratory results, and surgical characteristics were collected by trained physicians who are blind to the aim of study with a standard data collection form. In EuroSCORE-2, the critical state is an important variable that included various preoperative conditions and major adverse events. Refer to the definition of critical preoperative state in the EuroSCORE-2, the critical state was defined as a history of ventricular tachycardia or ventricular fibrillation or aborted sudden death, preoperative cardiac massage, preoperative ventilation before anesthetic room, preoperative inotropes, or end-organ damage. Recent myocardial infarction (MI) was defined as MI within 90 days. Increased serum creatinine was defined as serum creatinine measured before surgery >1.5 mg/dl. The echocardiographical parameters, including LVEF and Left ventricular internal diameter at enddiastolic (LVIDd), were extracted from the last preoperative echocardiography (closest to the time prior to surgery). BSA was calculated by Mosteller's formula (9). LVIDd/BSA ≥3.5 cm/m 2 indicated a moderate or serve Left ventricular (LV) dilatation according to Echocardiography's Guidelines for Chamber Quantification (10). Combined surgery indicated operations combined more than one procedure: include major interventions on the heart such as CABG, mitral valve repair or replacement, and treatment on ventricular aneurysm.

Clinical Outcome
The primary end point was post-operative mortality during hospitalization. Mortality was defined as any death occurring after a surgical procedure during the hospital stay.

Statistical Analysis
Continuous variables were expressed as mean ± standard deviation (SD) or median (25th, 75th percentiles) in case of normal or non-normal distribution. The differences between the two groups were examined by independent-sample ttest or Mann-Whiney U-test, correspondingly. Categorical variables were presented as counts (percentage) and compared by Pearson chi-square test (Pearson χ2 test) or Fisher exact test, as appropriate.
The entire cohort was randomly divided into training cohort and validation cohort (7:3) base on complete data. The training cohort was used to develop the model, and the validation cohort was applied to validate the model. Univariable logistic regression analysis was used to identify the possible predictive factors. The variables with a p < 0.15 in univariable analysis and those consistently reported in previous studies were candidates for multivariable logistic regression analysis to identify the independent risk factors for predicting postoperative mortality. We used a backward stepwise elimination approach to simplify the model based on the Akaike Information Criterion. LASSO regression was also applied in the predictors' selection to examine the importance of predictive variables selected by stepwise regression analysis. Based on the selected predictive variables, the logistic regression model was developed and presented as the nomogram.
We assessed the predictive accuracy of the nomogram with discrimination and calibration. To quantify the discrimination performance of the nomogram, Harrell's C-index was measured. The Harrells C statistic is a measure of discrimination that is similar to the area under a receiver operating characteristic curve (ROC) (11). Calibration curves were plotted to assess the calibration of the nomogram, accompanied with the Hosmer-Lemeshow test [A significant test statistic implies that the model doesn't calibrate perfectly (12)]. To further assess model calibration, predicted probabilities for mortality were calculated for participants in the training cohort, divided into quintiles, and compared with observed mortality. The results were presented as a bar chart. To decrease the overfit bias and increase precision, the nomogram model was subjected to bootstrapping validation (1000 bootstrap resamples) to evaluate a relatively corrected C-index and calibration ability in the training cohort.
To assess the performance of the nomogram in the validation cohort, the logistic regression formula developed in the training cohort was then applied in the validation cohort, with predicted postoperative mortality calculated. Finally, the C-index, the calibration curve, and the Hosmer-Lemeshow test were used. EuroSCORE-2 online calculator (http://www.euroscore. org) was used to calculate the predicted mortality of each patient. DeLong's test was used to compare C-index between the nomogram and the EuroSCORE-2 in the training and validation cohort, respectively. Besides, we calculated the categorical net reclassification improvement (NRI) and integrated discrimination improvement (IDI) to determine the extent to which the predictive power of the nomogram is better than EuroSCORE-2. Calibration of the two models was evaluated and compared by the Hosmer-Lemeshow statistic χ2 and P > 0.05 indicates the model fits well. Similarly, the two models were visualized graphically by comparing the observed probability with the predicted probability of death across quintiles of predicted risk.
The present study is reported in compliance with standard guidelines (13) for prediction models and the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) checklist is presented in Supplementary Material. Statistical analysis was conducted in R software (version 4.0.2; http://www.Rproject.org). C-index, calibration curve, nomogram, and bootstrapping validation were calculated or formulated using rms and riskRegression packages in R. NRI and IDI were calculated with PredictABEL packages in R. A two-tailed p < 0.05 indicated statistical significance. Data analysis was conducted from November 7, 2020 to February 24, 2021.
We randomly allocated 70% (342) of patients to the training cohort and the remainder 30% (147) to a validation cohort. There were 26(7.6%) and 10(6.8%) patients who died after surgery in the training and validation cohorts, respectively. The baseline characteristics in all cohorts are listed in Table 1. There were no significant differences between the training and validation cohorts regarding preoperative baseline and surgical characteristics.

Univariable Analysis
The results of the univariable logistic regression analysis of predictors associated with postoperative mortality in the training cohort are presented in Table 2. Univariable analysis in the training cohort showed a significant association of postoperative morality with several predictors including age, critical state, diabetes on insulin, stroke, recent myocardial infarction (MI) within 90 days, CCS angina class IV, lower limb arterial stenosis, left ventricular ejection fraction (LVEF) ≤35%, LV (left

Multivariable Analysis
Multivariable logistic regression analysis demonstrated that age, critical state, recent MI within 90 days, stroke, LVEF ≤35%, left ventricular (LV) dilatation, increased serum creatinine, and combined surgery remained significant independent risk factors for postoperative mortality. The β-coefficients, odds ratios, 95% confidence interval (CI) and p-values for each of the variables in the multivariable analysis are displayed in Table 3. What's more, LASSO regression also resulted in eight predictive variables the same as the variables selected by the stepwise regression method (Figure 1).

Nomogram Derived From the Training Cohort
The model that integrated selected predictive factors was developed and presented as the nomogram (Figure 2). The C-index for death risk prediction in the training cohort was Frontiers in Cardiovascular Medicine | www.frontiersin.org  For the patients whose predicted mortality probabilities were below 40%, the calibration curve demonstrated an optimal agreement between the prediction by nomogram and actual observation ( Figure 3A). In addition, the calibration curve with bootstrap similarly showed good calibration in patients in whom the predicted mortality probabilities were below 40% (Supplementary Figure 1).

Validation of Predictive Accuracy of the Nomogram in the Validation Cohort
In the validation cohort, the C-index of the nomogram for predicting postoperative mortality was 0.899(95%CI, 0.835-0.963; Table 4). There was no significant difference regarding the C-index between the training and validation cohort (0.889 vs. 0.899, p = 0.804). The Hosmer-Lemeshow test similarly yielded a non-significant statistic (p = 0.682) indicating acceptable goodness-of-fit. For patients with predicted mortality probabilities below 40%, the calibration curve also showed accepted agreement between prediction and observation in the probability of mortality ( Figure 3B). Model calibration of the nomogram was further explored by comparing the predicted and observed probabilities across predicted risk quintiles. It showed that the nomogram had an acceptable agreement between prediction and observation both in the training and validation cohort (Figures 4A,B). The nomogram derived from the training cohort displayed good discrimination and calibration in predicting postoperative mortality both in the training and validation cohort.

Comparison of Predictive Accuracy Between the Nomogram and EuroSCORE-2
The C-index of the nomogram was significantly higher than the EuroSCORE-2 in training (0.889 vs. 0.762, p = 0.005) and validation cohort (0.899 vs. 0.816, p = 0.039; Table 4 FIGURE 2 | The nomogram derived from training cohort for predicting mortality after CABG. MI, myocardial infarction; LVEF, left ventricular ejection fraction.  Table 4). The nomogram had acceptable calibration in training (Hosmer-Lemeshow statistic χ 2 = 7.016, p = 0.535) and validation cohort (Hosmer-Lemeshow statistic χ 2 = 5.694, p = 0.682; Table 4). For EuroSCORE-2, the Hosmer-Lemeshow test yielded a significant statistic in training (Hosmer-Lemeshow statistic χ 2 = 77.337, p < 0.001) and validation cohort (Hosmer-Lemeshow statistic χ 2 = 24.998, p = 0.002), indicating that the EuroSCORE-2 does not calibrate perfectly. For patients with an expected mortality rate of <40%, the calibration curve of the nomogram indicated a good fit of predicted and observed mortality in the training and validation cohort (Figures 3A,B). As for EuroSCORE-2, the calibration curve showed poor agreement between prediction and observation in the probability of mortality in the training and validation cohort. The calibration curve was almost above the 45 • diagonal line, which means EuroSCORE-2 underestimated the probability of mortality, especially in high-risk patients (Figures 3C,D). Model calibration was further explored by comparing the predicted and observed probabilities of mortality across patient predicted risk quintiles. It also shows that EuroSCORE-2 underestimated the probability of mortality in high-risk patients (Figures 4C,D).

DISCUSSION
To our knowledge, we developed the first nomogram model to efficiently predict the in-hospital mortality after CABG among patients with HFrEF. The nomogram risk prediction model performed well in our training and validation cohorts, and showed good discrimination and calibration in patients with predicted mortality probabilities below 40%. The model incorporates only eight preoperative variables which are easily measured and readily available: age, critical state, recent MI, stroke, LVEF ≤35%, left

Risk Factors
The nomogram incorporates only 8 variables but achieved good model performance. We can conclude that the 8 risk factors included in the nomogram are the most important variables associate with mortality in patients with HFrEF undergoing CABG. It is well-established that age independently affects post-CABG mortality, and was included in the nomogram. Contrary to commonly used risk scores, sex and BMI were not independent risk factors in the nomogram. In EuroSCORE-2, previous cardiac surgery and critical state were two risk factors given the heaviest weight. Similarly, the critical state was given the heaviest weight in the nomogram. However, previous cardiac surgery wasn't included in our model because only two patients had a history of cardiac surgery in the entire cohort and accounted for a very small proportion in our cohort. A growing number of literatures documented the effects of renal dysfunction on mortality and morbidity after CABG surgery (14)(15)(16)(17)(18). Serum creatinine is often used to reflect renal function because it is readily available and simple. It was reported that patients with a baseline serum creatinine of more than 1.5 mg/dl had a significantly higher 30-day mortality after CABG (15). Consistent with those reports, in our model, we defined increased serum creatinine as serum creatinine >1.5 mg/dl and similarly found it was independently associated with increased postoperative mortality in patients with HFrEF. The combined surgery not only reflects more severe lesions that need additional intervention of mitral valve or ventricular aneurysm, but also reflects a longer time of anesthesia and use of cardiopulmonary bypass (CPB). These factors increased the risk of surgery but also might have encouraged surgeons to change or simplify operative procedures to limit anesthesia time and avoid cardiopulmonary bypass.
One of the most powerful predictors of in-hospital mortality in our study was LV dilatation. Yamaguchi et al. (19) revealed that preoperative LV end-systolic volume index (LVESVI) >100 ml/m 2 predicted the development of congestive HF and late  mortality in patients with LVEF <30% undergoing isolated CABG. The results from Surgical Treatment for Ischemic Heart Failure (STICH) Trial (20) showed that, in patients with left ventricular dysfunction who underwent CABG, LVESVI was a stronger predictor of 30-day mortality than LVEF, and mortality risk increased linearly with increasing values of LVESVI. Fukunaga et al. (21) found that LV size >5.5 cm was a significant predictor of operative mortality and major morbidity (OR 5.5 [2.0-15.7] (p < 0.001) in patients undergoing isolated CABG. Our study defined LVIDd/BSA ≥3.5 cm/m 2 as moderate or serve LV dilatation. Similarly, we found LV dilatation was a significant risk factor of in-hospital mortality and showed stronger predictive ability than LVEF. Well-accepted surgical risk scores have identified only LVEF as a powerful predictor of surgical and 30-day mortality, which may be inaccurate. A variable reflecting LV size may be a more important predictor of outcome than LVEF and should be incorporated into their risk-adjustment models.

The Advantages of Nomogram Compared With the EuroSCORE-2
EuroSCORE-2 and STS score are the most commonly used risk scores and have been proven effective in assessing postoperative risk for general patients undergoing cardiac surgery (22)(23)(24). However, these scores were based on data including only a small number of patients with HFrEF and may not be accurate to predict surgery risk in such high-risk patients. Howell et al. (25) showed that EuroSCORE-2 performed not well with a low C-statistic of 0.67 and poor model calibration (chi-square 16.5; p = 0.035) in high-risk patients who underwent cardiac surgery (preoperative logistic EuroSCORE ≥10). Several pieces of literature reported that EuroSCORE-2 or STS score had underestimated surgery mortality of CABG when applied to specific high-risk populations (26)(27)(28). Di Dedda et al. revealed that, EuroSCORE-2 significantly underestimated the mortality risk (predicted mortality 6.5%) in high-risk patients with cardiac surgery (observed mortality 11%). (26). In patients with an LVEF ≤35% undergoing CABG, it has been reported (29) that both the STS Score and the EuroSCORE-2 performed moderately well, but with a C-index (C statistic is <0.75), somewhat inferior to that reported for overall cardiac surgical populations (where their C statistic is >0.80). What's more, both the STS score and EuroSCORE-2 significantly underestimated mortality. The STS score appeared to consistently underestimate risk compared with the EuroSCORE-2. Consistent with these reports, in our study, EuroSCORE-2 had a moderate C-index (0.762 and 0.816 in training and validation cohort) and similarly significantly underestimated the risk of mortality after CABG in patients with HFrEF as shown in the calibration curve, especially in the high-risk group.
Unlike Western countries, China is a developing country and has different medical standards and characteristics. For example, Off-pump CABG is more common than on-pump CABG in china. Thus, based on populations in Europe and the US, EuroSCORE-2 and STS score are not suitable for Chinese patients. Moreover, the data of EuroSCORE-2 and STS score were obtained from more than 10 years ago, which could be outdated with the improvements in surgical, anesthetic and intensive care during the past decade. Consequently, a new model developed for specific Chinese patients with HFrEF undergoing CABG is urgently needed.
In this study, we established a nomogram prediction model that showed favorable discrimination with C-index consistently more than 0.8 and significantly higher than EuroSCORE-2 in the training and validation cohort. Besides, the nomogram showed a better calibration than EuroSCORE-2 in both cohorts. We thought it might be attributed to reasons as followed: First, our nomogram was specifically developed for patients with HFrEF instead of general patients. Second, the nomogram was developed using data from the last 8 years. However, EuroSCORE-2 was based on data obtained from more than 10 years ago, which could be outdated. Third, it has been reported that EuroSCORE-2 underestimated mortality in the high-risk Chinese patients undergoing CABG (27). Different from EuroSCORE-2 that based on the western population, our nomogram is more suitable for Chinese patients. Fourth, LV dilatation is a more important predictor of outcome than LVEF and was incorporated in the nomogram but not in the EuroSCOR-2. Finally, our risk model developed from single-center data with internal validation instead of external validation. The performance of the nomogram in external validation may not be that good.
Furthermore, our nomogram has unique advantages over traditional risk scores. It has only eight risk factors generally included in the medical records and was easier to calculate risk bedside in a few minutes and worthy of clinical popularizing. However, the STS score is complex, with more than 50 demographic and operative variables, and even EuroSCORE-2 has 18 variables. Despite fewer variables for prediction, our nomogram had demonstrated better predictive performance in calibration and discrimination than EuroSCORE-2. With fewer variables but achieved better model performance, this study demonstrates the utility and feasibility of using specific patient data for constructing models to improve prediction of cardiac surgery mortality in specific populations and gain additional insight into factors that modify the risk of outcomes in patients with HFrEF.

Limitation
There are several limitations in this study. First, this study is a retrospective analysis, and hence selection bias remains a possibility and prospective studies are required to confirm the results. Second, our risk model was developed from single-center data without external validation. Although we tried to overcome this limitation by internal validation in the validation cohort and additional validation with the bootstrap method, external validation in other cohorts is needed before clinical application. Third, the nomogram model was developed and validated in a small cohort with only 36 outcomes. Considering the relatively small sample size, results from this study should be interpreted with caution. The present study is a preliminary explore in predicting risk of mortality in these specific high-risk patients with CABG. And future studies with large sample size are needed to further confirm our findings. Finally, the model was based on routine clinical data, some potentially important predictor variables were not collected, such as natriuretic peptide levels. Specific markers to estimate surgery risk might further improve the accuracy of the model.

CONCLUSION
In conclusion, this study presents an easily applied nomogram that can predict in-hospital mortality in HFrEF patients undergoing CABG. This nomogram showed an improvement in the predictive accuracy when compared to EuroSCORE-2. The nomogram may help identify HFrEF patients at high risk of in-hospital mortality after CABG who might benefit from a simplified operation approach, perioperative intense attention, and more personalized treatment.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethical approval was obtained from the Institutional Ethics Committee of Beijing Anzhen Hospital. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
PY, RD, TL, KZ, and JC: study concept and design. All authors acquisition, analysis, or interpretation of data, critical revision of the manuscript for important intellectual content, and read and approved the final manuscript. PY: drafting of the manuscript.