Early Prediction of Left Ventricular Reverse Remodeling in First-Diagnosed Idiopathic Dilated Cardiomyopathy: A Comparison of Linear Model, Random Forest, and Extreme Gradient Boosting

Introduction: Left ventricular reverse remodeling (LVRR) is associated with decreased cardiovascular mortality and improved cardiac survival and also crucial for therapeutic options. However, there is a lack of an early prediction model of LVRR in first-diagnosed dilated cardiomyopathy. Methods: This single-center study included 104 patients with idiopathic DCM. We defined LVRR as an absolute increase in left ventricular ejection fraction (LVEF) from >10% to a final value >35% and a decrease in left ventricular end-diastolic diameter (LVDd) >10%. Analysis features included demographic characteristics, comorbidities, physical sign, biochemistry data, echocardiography, electrocardiogram, Holter monitoring, and medication. Logistic regression, random forests, and extreme gradient boosting (XGBoost) were, respectively, implemented in a 10-fold cross-validated model to discriminate LVRR and non-LVRR, with receiver operating characteristic (ROC) curves and calibration plot for performance evaluation. Results: LVRR occurred in 47 (45.2%) patients after optimal medical treatment. Cystatin C, right ventricular end-diastolic dimension, high-density lipoprotein cholesterol (HDL-C), left atrial dimension, left ventricular posterior wall dimension, systolic blood pressure, severe mitral regurgitation, eGFR, and NYHA classification were included in XGBoost, which reached higher AU-ROC compared with logistic regression (AU-ROC, 0.8205 vs. 0.5909, p = 0.0119). Ablation analysis revealed that cystatin C, right ventricular end-diastolic dimension, and HDL-C made the largest contributions to the model. Conclusion: Tree-based models like XGBoost were able to early differentiate LVRR and non-LVRR in patients with first-diagnosed DCM before drug therapy, facilitating disease management and invasive therapy selection. A multicenter prospective study is necessary for further validation. Clinical Trial Registration:http://www.chictr.org.cn/usercenter.aspx (ChiCTR2000034128).


INTRODUCTION
Dilated cardiomyopathy (DCM) is the third leading cause of heart failure with decreased ejection fraction and the most important cause of heart transplantation (1,2). Its 1-year mortality rate is as high as 25-30%, and its 5-year survival rate is <50% (3). Significant improvements in left ventricular enddiastolic diameter (LVDd) and left ventricular ejection fraction (LVEF) are referred to as left ventricular reverse remodeling (LVRR) (4). Despite the use of angiotensin-converting enzyme inhibitors (ACEIs), β-blocker, and mineralocorticoid receptor antagonists, LVRR happened only in approximately 37-52% of DCM patients (5)(6)(7)(8)(9)(10). Therapy-induced LVRR has become an important prognostic tool in the management of patients with DCM (5,11). If a patient is not responsive to medication, not only an early implantable cardioverter defibrillator may be necessary but also the timing of device therapy and insertion in the transplant list are important considerations since these aspects differ from those who are responsive to medication. Despite an increasing understanding of the progression of DCM, prognostic stratification of patients with early phases of DCM remains a challenge (12). It can be seen that early prediction of LVRR will help us to achieve precise management of patients with DCM.
Several early studies have reported the association between some clinical indexes and LVRR in DCM. Kawai et al. (13) first demonstrated that higher systolic blood pressure and lower pulmonary arterial wedge pressure at diagnosis were predictors of LVRR with medical therapy. Afterward, cardiac magnetic resonance was used for the prediction of LVRR. Several studies reported that late gadolinium enhancement at baseline provides a better prediction of LVRR (10,(14)(15)(16)(17). However, there is no definite agreement in previous studies in regard to late gadolinium enhancement as an early predictor of LVRR (18). Genotype is also proven to associate with LVRR in DCM. It is reported that an inverse and independent association exists between structural cytoskeleton Z-disk gene rare variants and LVRR (19). Verdonschot et al. (7) also demonstrated that the model including mutation status performs better than the model with only clinical parameters (AUC = 0.760 vs. 0.742, p = 0.008). However, the difficult and expensive measurement limits their clinical application. Ruiz-Zamora et al. (20) found a simple logistic model including five variables with an AUC of 0.83. However, this model included several variables obtained at the end of follow-up, so we cannot make an early prediction for LVRR, which usually happens within 1 to 2 years in patients with DCM. Therefore, if we can identify LVRR in DCM when first diagnosed with a combination of several usual clinical parameters, it could help to make important clinical decisions concerning the need and timing of some therapies in patients with DCM.
Machine learning performs more objectively in selecting predictor variables and handles possible non-linear effects of variables better than traditional statistical methods. A tree-based ensemble algorithm can aggregate multiple weak learners to attain a stronger ensemble model by bagging and boosting two different ensemble ways, among which random forests and extreme gradient boosting (XGBoost) are, respectively, their representative methods. Random forests can use the bootstrap sampling method for avoiding instability of the model, while XGBoost algorithm was developed mainly for penalizing the structure of a decision tree to avoid overfitting (21). It has been found that this XGBoost technique outperforms other machine learning and deep learning methods in many competitions such as Kaggle and KDDCup (22). It has been successfully applied in numerous bioinformatics studies (23,24) and medical studies (25,26). Therefore, we conducted a retrospective real-world study and analyzed clinical data by using tree-based learning algorithms to build a predictive model and validate it.

Study Population
This study was a single-center real-world study. The clinical data of patients were collected from consecutively admitted patients with their first diagnosis of DCM at the Sun Yatsen Memorial Hospital of Sun Yat-sen University between January 2014 and December 2017, and each of the patients had several follow-up records. DCM was diagnosed in keeping with the Chinese guidelines for the diagnosis and treatment of DCM (27) as follows: (1) LVDd >5.0 cm (female) or LVDd >5.5 cm (male); (2) LVEF <45% and left ventricular shortening fraction <25%; and (3) exclusion of valvular heart disease, congenital heart disease, ischemic heart disease, tachycardiomyopathy, and secondary DCM caused by systemic diseases. Patients with any of the following conditions were excluded: (1) alcoholic cardiomyopathy, peripartum cardiomyopathy, and other acquired DCM; (2) a history of HF treatment including ACEIs/angiotensin receptor blockers (ARBs)/angiotensin receptor-neprilysin inhibitors (ARNIs), adrenergic beta-receptor blockers, and mineralocorticoid receptor antagonists; (3) coronary heart disease (having narrowed coronary arteries 50% or more according to coronary angiography or coronary CTA), pulmonary heart disease, organic heart valvular disease, congenital heart disease, hypertensive heart disease, or pericardial disease; (4) not receiving medical therapy recommended by the Chinese Guidelines for the Diagnosis and Treatment of Heart Failure 2018 (28); (5) systemic diseases that may affect the structure and function of the heart, such as hyperthyroidism, hypothyroidism, amyloidosis, pheochromocytoma, systemic lupus erythematosus, or Behcet's disease; (6) cancer, severe infection, or severe renal dysfunction (estimated glomerular filtration rate (eGFR) <15 ml min −1 ·1.73 m −2 ); and (7) receiving cardiac resynchronization therapy or left ventricular assist device during follow-up. This study was approved by the institutional review board of Sun Yatsen Memory Hospital and had therefore been performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its later amendments. No informed consent was required because the data in our study were anonymized. All patients received standard medical therapy according to current guidelines (27,28).

Data Collection
All data of baselines and return visits were obtained from electronic health records including demographic characteristics, physical sign, comorbidities, laboratory indicators, electrocardiogram, 24-h dynamic electrocardiogram, echocardiographic data, and medication. The blood samples were collected after fasting for 12 h overnight. LVEF was measured using the apical biplane method and transthoracic echocardiography was performed as recommended by the American Society of Echocardiography (29) by a senior echocardiographer at admission and during the follow-up period. The New York Heart Association (NYHA) class was evaluated in this study within the first 8 h of admission.

Definition of Variables
According to the European Association of Cardiovascular Imaging and the American Society of Echocardiography (30), the relative wall thickness was calculated as the ratio of two times the posterior wall thickness to LVDd. Left ventricular mass (LVM) was calculated according to the formula in (1). The normalization of LVM for body surface area was regarded as the left ventricular mass index. Body surface area was estimated by the formula in (2) (31). The eGFR was calculated using the modification of diet in renal disease equation (32). The doses of ACEIs/ARBs/ARNIs and β-blockers were evaluated by the ratio of the practical dose and target dose of certain drugs within 6 months (28).

Return Visits
The patients underwent a return visit as required. The end of visits was December 2018, the date of death or heart transplantation. Transthoracic echocardiography was performed in all visits. LVRR was defined as an absolute increase in LVEF from >10% to a final value >35% accompanied by a decrease in LVDd ≥10% (10) as assessed at any one visit and lasted until the last visit (median time 24 months, IQR 15-31). Non-LVRR was defined as an absolute increase in LVEF <10% or final value <35% or a decrease in LVDd <10% as assessed at all visits, except those in <9 months. Patients who did not meet the definition of LVRR and have a last visit <9 months were excluded (Figure 1).

Statistical Analysis
Normally distributed variables are presented as the means ± standard deviations, while non-normally distributed variables are presented as medians with interquartile ranges. NT-proBNP, cTNT, D-dimer, and hsCRP were logarithmically transformed to approximate a normal distribution. The Levene test was used to explore the homogeneity of variance, and a p-value of <0.1 was considered to indicate heterogeneity of variance. Differences between groups were tested by the independent t-test or Mann-Whitney U-test for continuous variables and the chi-square test for categorical variables. De long test was used to detect if the difference between AUCs was statistically significant. Statistical significance was defined as a two-sided p-value of <0.05.

Data Imputation
A total of 102 features were included for analysis and are described in Supplementary Table 1. Moreover, 65 variables had no missing data, 23 variables had <10% missing data, and the remaining 14 variables had >10% missing data. None of the variables had >50% missing data. All variables were standardized when selecting features and building models to mitigate the effect of the differences in dimensions between variables. The specific method is described in (3), where X k0 and X k are the kth values of a certain variable before and after standardization, while X min and X max are the minimum and maximum values of a certain variable, respectively. K-nearest neighbors were used for the imputation of continuous and discrete variables, which took the average of K samples nearest to the missed point as its value.

Model Development
We chose three standard supervised machine learning methods for our data: XGBoost (21), random forest (33), and logistic regression with l 1 penalty (34). The cases and controls involved in this study were randomly divided into training and testing sets with the ratio, train:test = 6:4. These models were trained on the training set with 10-fold cross-validation and were validated on the testing set (Figure 1). A grid search scheme was performed on the training set through the 10-fold cross-validation to search for the optimal combination of parameters of the model, where the training set was randomly split into 10 subsets. For each combination of parameters, nine subsets were trained for a model and the remaining one was used for validation of the model. The process was repeated for 10 times so that each subset was tested once and the average of their results was collected to measure the performance of the parameter combinations. As a result, we selected the parameter combination that reached the highest AUC to train a model based on the whole training set, and then the model was tested on the independent test set. The discrimination of models was evaluated using the receiver operating characteristic (ROC) curve. The calibration was performed using the isotonic regression (35) and evaluated by a calibration plot.

Feature Selection
The distribution of each feature is shown in Supplementary Figure 1. Feature selection was also performed to optimize the feature combination in constructing a prediction model. In this study, we used a greedy feature selection algorithm based on the important features recommended by a specific model. In general, a specific model was first pretrained to obtain the important features with 10-fold cross-validation on the training set, from which we select the feature greedily according to AUC. The important features included the features with an importance greater than zero. In the greedy searching process, the selection algorithm began with an empty set of features and iteratively searched the best feature from the remaining feature set and added the best feature to the empty set for a higher AUC. This procedure was repeated until the remaining feature set was empty or AUC no longer increased, leading to a best feature subset for building a final prediction model.

Machine Learning and Statistical Tools
The research data of our study were assessed with the machine learning tools of the scikit-learn project. The tool environment we applied was Python 3.7.6 with scikit-learn 0.22 running on Anaconda 3 (4.8.5-Linux-x86_64) for data processing, modeling, and evaluation. SPSS version 22.0 (IBM SPSS Statistics, IBM Corporation, Armonk, NY, USA) was used to perform the descriptive statistics.

Baseline Characteristics
A total of 378 inpatient clinical data points from 104 patients were collected. Among the 104 patients analyzed, LVRR was observed in 47 individuals (45.2%) (Figure 1). The characteristics and the distribution of the patients are shown in Table 1 and Supplementary Figure 1. Patients who developed LVRR were more likely to have a higher systolic blood pressure, higher platelet count, lower serum D-dimer level, higher highdensity lipoprotein cholesterol (HDL-C) level, smaller left atrial dimension, and smaller right ventricular end-diastolic dimension and were less likely to suffer from severe mitral regurgitation (MR). The use or doses of ACEIs/ARBs/ARNIs and β-blockers were not significantly different between the two groups.

Data From Visits
All patients completed return visits. The details of the time distributions of visits are shown in Supplementary Figure 2. LVEF and LVDd were similar between the two groups at baselines, but in the LVRR group, LVEF, LVDd, left atrial dimension, and severity of MR were improved significantly and tended to be stable after 1 year (Figures 2A,B,D,G). Right ventricular end-diastolic dimension, left ventricular posterior wall dimension, and interventricular septal dimension showed no obvious change during return visits both in LVRR and non-LVRR groups (Figures 2C,E,F). NYHA functional class in the LVRR group was better than that in non-LVRR groups at each time point (Figure 2H).

Classifier Model Development and Validation
The individual features were tested in their ability to classify the LVRR and the non-LVRR. As indicated by Figure 3A, there are more than 20 features (30.12%) with an AUC that only reached slightly more than 0.5, and only five features with an AUC larger than 0.65. The maximum AUC of all features is <0.7. Thus, it is necessary to identify the combined effects of the features in discriminating the LVRR and the non-LVRR. The feature selection procedure is shown in Figure 3B. The tree-based model was first pretrained on the training set to obtain the important features (we describe the result of XGBoost here). Finally, 33 features were selected as important. From these features, we used greedy search to obtain the feature subset which can reach an accurate classification result. The greedy searching provided nine features. Figure 3C shows their importance rank. These features were used to train an XGBoost model with 10-fold cross-validation, which consequently achieved AUC 0.8463 and 0.8205 on the CV (cross-validation) set and test set, respectively ( Figure 3D and Supplementary Figure 3). The similarity of the AUC on training and testing set also accounts for the robustness of the model.
Ablation analysis was performed with 10-fold cross-validation to estimate the contributions of each feature in the prediction. As shown in Figure 3E, the absence of each of them could cause a decline of the AUC. Moreover, we observed that cystatin C is the most important feature above all. The ablation of cystatin C can reduce the AUC from 0.8205 to 0.6591.
By comparison, we tested other machine learning methods including logistic regression with l 1 penalty and random forests with the same process shown in Figure 3B. As shown in Figure 3D, our method using XGBoost and random forests achieved better AUCs than the linear model on the test set, with AUCs of 0.8205 (95% CI 0.6775-0.9497, p = 0.0119 vs. LR) and 0.7989 (95% CI 0.6589-0.9408, p = 0.0258 vs. LR), respectively. From the confusion matrix of each model shown in Figure 4, we found that the XGBoost can correctly classify 13 of 22 LVRR patients and 16 of 20 non-LVRR patients on the test set, while the random forests can correctly classify 18 of 22 LVRR patients and 13 of 20 non-LVRR patients. The above fact indicated that XGBoost and random forests showed different advantages in classifying the non-LVRR patients and LVRR patients. Moreover, these two tree-based models are both superior to the logistic regression model in classifying LVRR and non-LVRR. Table 2 also reveals the truth by comparing the recall and the sensitivity measurements in classifying LVRR and non-LVRR. Furthermore, we did calibration analysis of the above three models in order to get more statistic evidence for model performance comparison. As shown in Figure 3F, these models had similar calibration.
Frontiers in Cardiovascular Medicine | www.frontiersin.org 9 August 2021 | Volume 8 | Article 684004 FIGURE 3 | random forest, and the XGBoost algorithms, respectively; (E) ablation analysis is performed to evaluate the contributions of each feature in the prediction; (F) calibration plot of three models. Blue, green, and red curves were generated by the logistic regression, the random forest, and the XGBoost algorithms, respectively. CysC, cystatin C; eGFR, estimated glomerular filtration rate; HDL-C, high-density lipoprotein cholesterol; LA, left atrial; LVPWd, left ventricular posterior wall dimension; MR, mitral regurgitation; NYHA, New York Heart Association; RVDd, right ventricular end-diastolic dimension; SBP, systolic blood pressure.

DISCUSSION
In this study, our key findings are as follows: (1) the XGBoost and random forest classifiers combining routine clinical indexes collected before treatment show higher accuracy than logistic regression in predicting LVRR in patients with DCM. (2) Baseline cystatin C, right ventricular end-diastolic dimension, and HDL-C are the most important features in this model, but not LVEF and LVDd. These machine classifiers might be useful to identify the patients who may not respond to the medication and in whom early clinical monitoring and early implementation of preventive strategies may be helpful.
To the best of our knowledge, this is the first study using ensemble tree models of machine learning to predict LVRR. Compared with traditional regression, these models avoid presupposing a linear relation between different variables and the assumptions that are required for correctness of statistical models. In our study, optimized classifiers such as XGBoost and random forest performed with similar better accuracy in predicting LVRR. These ensemble tree models might be useful for improvement in risk factor management in DCM. Unlike the assessment for business risk or the prediction for mortality risk, we pay more attention to better discrimination in the early identification of non-LVRR in DCM, which may be followed more intensively. For the XGBoost model that performed more accurately in differentiating non-LVRR, it was chosen as the final model for subsequent analysis. Moreover, we also found that a single clinical index cannot predict LVRR well, which indicated that LVRR is a consequence of coaction of several factors. At last, we built the XGBoost model including four echocardiogram indexes, three routine laboratory indexes, systolic blood pressure, and NYHA functional class. LVRR is more likely to occur in patients with NYHA functional class I-II, compared with those with NYHA functional class III-IV [61.3% (19/31) vs. 38.4% (28/73), p = 0.032]. Patients with NYHA functional class I-II may be in the early stages of the disease. It has been reported that a shorter duration of disease is associated with a higher likelihood of recovery of LVEF (4). This result is also consistent with some prior reports (20,36).
Our ablation analysis showed that serum cystatin C contributes remarkably for the predictive model, which is a similar finding to those of previous studies on prognosis of dilated cardiomyopathy. It has been reported that cystatin C was the best predictor of LVEF increase in DCM patients (37). Chatterjee et al. (38) revealed that baseline cystatin C showed incremental benefit in the prediction of cardiac resynchronization therapy non-response compared with conventional renal markers. As we all know, cystatin C is not subject to variability in renal filtration and is considered to be a more stable renal marker, which is less sensitive to gender and age. However, cystatin C may not only serve as a marker of intersecting cardio-renal pathways in patients with DCM but also associate with cathepsin B inhibition, collagen accumulation, and myocardial fibrosis, as an inhibitor of cathepsins, which play a role in the degradation of the extracellular matrix (39). It has been reported that an excess of cystatin C leads to extracellular tissue inhibitor of metalloproteinase-1 and osteopontin accumulation in human cardiac fibroblast cells (40). We speculate that cystatin C takes part in alterations in collagen metabolism and the process of cardiac fibrosis in DCM, which was shown as a key determinant of left ventricular remodeling in DCM (14). Hence, the combination of cystatin C and eGFR (calculated by creatinine) leads to obvious improvement in our model for LVRR in DCM.
In the ablation analysis, we can see that there are four important clinical indexes of cardiac structure obtained by echocardiography. Echocardiography represents the firstline examination in patients with DCM. Our results are similar to those of previous studies on prognosis and dilated cardiomyopathy. Barison et al. (41) reported that prognosis in patients with <35% LVEF was not significantly worse than those with LVEF >35% (p = 0.476). La Vecchia et al. (42) reported that right ventricular end-diastolic volume but not LVEF was demonstrated as an independent predictor of transplant-free survival. Recent studies also found that right ventricular function can be used for prediction in the prognosis of DCM (42,43). Furthermore, baseline right ventricular dysfunction was proven as a stronger predictor than other known prognostic factors, such as NYHA functional class, functional mitral regurgitation (43), and systolic blood pressure (5,13). Right ventricular dysfunction may reflect an increased pulmonary artery pressure (44), which may represent an advance stage of ventricular remodeling. Although, right ventricular end-diastolic dimension did not adequately reflect right ventricular function, the combination of adverse remodeling characteristics, such as functional mitral regurgitation and enlargement of other chambers, can provide valuable information for prediction.
HDL-C was another important variable that contributes much in a predictive model from ablation analysis. Emmens et al. (45) reported an inverse association between HDL-C and all-cause mortality or MACE in HFrEF, but not in HFpEF. Freitas et al. (46) also obtained a similar result. The mechanism underlying the association between HDL-C and left ventricular reverse remodeling is not yet clear. Emerging evidence shows that subfractions of HDL have antioxidant, anti-inflammatory, and endothelial cell protective capacity (47)(48)(49). Sampietro et al. (50) also found a significant association between HDL-C level and idiopathic DCM and a negative correlation between HDL-C level and inflammation markers, which are similar to our results (Supplementary Figure 4). It may be because serum NT-proBNP levels at first admission can indicate only a short congestive state (51), and there are several novel mechanisms between HDL-C level and left ventricular reverse remodeling in patients with DCM; in our study, there are obvious differences in the HDL-C level but not in hsCRP and NT-proBNP between the LVRR and non-LVRR groups. In addition, DCM is a kind of clinical syndrome which has an impact on multiple organ systems and diverse etiologies. We need the timely identification of LVRR, which can be helpful for their precise management. Machine learning applications might be an attractive option to provide a solution to this problem.

Study Limitations
A limitation of our study is that it is a single-center and retrospective study, so we should obtain stronger evidence by performing a large sample prospective study and external validation. A further limitation is that we focused on the predictive performance rather than statistical inference. Therefore, we cannot draw a conclusion about risk factors. In addition, compared with the linear models, tree-based models usually own some unexplainable feature mechanism.

CONCLUSIONS
XGBoost and random forest algorithms exhibit good performance for predicting LVRR in patients with DCM. The combination of routine laboratory indicators and echocardiography indexes can be used for predicting LVRR in DCM. These machine learning classifiers might be useful for accurate management and risk evaluation of patients with DCM.

DATA AVAILABILITY STATEMENT
The data used to support the findings of this study are available from the corresponding author upon reasonable request.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Medical Ethics Committee of Sun Yat-Sen Memorial Hospital. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
JW, YZ, YC, and HZ contributed to the conception and design of the study. XX and MY contributed to the collection of data. XX, MY, HZ, and SX contributed to the analysis and interpretation of the data. XX, XW, YJ, ZL, and YZ contributed to the drafting of the article. All authors have revised the manuscript critically for important intellectual content, read, and approved the final manuscript.