From patterns to prognosis: machine learning–derived clusters in advanced heart failure

Karaçam, Murat; Kültürsay, Barkın; Mutlu, Deniz; Tanyeri, Seda; Kaya, Azmican; Efe, Süleyman Çagan; Doğan, Cem; Halil, Gülümser Sevgin; Akbal, Özgür Yaşar; Kırali, Kaan; Acar, Rezzan Deniz

doi:10.3389/fcvm.2025.1669538

ORIGINAL RESEARCH article

Front. Cardiovasc. Med., 23 October 2025

Sec. Heart Failure and Transplantation

Volume 12 - 2025 | https://doi.org/10.3389/fcvm.2025.1669538

This article is part of the Research TopicTransforming Care in Heart Failure and Cardiomyopathies: Emerging Insights and TreatmentsView all 13 articles

From patterns to prognosis: machine learning–derived clusters in advanced heart failure

Murat Karaçam^1*^†

Barkın Kültürsay^2,†

Deniz Mutlu^3,†

Seda Tanyeri^4,†

Azmican Kaya^4,†

Süleyman Çagan Efe^4,†

Cem Doğan^4,†

Gülümser Sevgin Halil^4,†

Özgür Yaşar Akbal^4,†

Kaan Kırali^5,†

Rezzan Deniz Acar^4,†

¹Department of Cardiology, Bitlis State Hospital, Bitlis, Türkiye
²Department of Cardiology, Tunceli State Hospital, Tunceli, Türkiye
³Center for Coronary Artery Disease, Minneapolis Heart Institute Foundation, Minneapolis, MN, United States
⁴Department of Cardiology, Kartal Kosuyolu Research and Education Hospital, Istanbul, Türkiye
⁵Department of Cardiovascular Surgery, Kartal Kosuyolu Research and Education Hospital, İstanbul, Türkiye

Introduction: Advanced heart failure (HF) is a clinically heterogeneous condition with poor prognosis, and traditional classification systems often fail to capture the complexity needed for personalized care. This study aimed to identify clinically meaningful phenotypic subgroups among patients with advanced HF using unsupervised machine learning and to evaluate their association with long-term outcomes.

Methods: A retrospective analysis was conducted on 524 patients with advanced HF who underwent comprehensive clinical, echocardiographic, hemodynamic, and cardiopulmonary exercise assessments. Using k-means clustering on standardized, multidimensional data, two distinct phenotypes were identified. The primary composite outcome was defined as all-cause mortality, left ventricular assist device implantation, or heart transplantation. Associations between cluster assignment and outcomes were evaluated using Kaplan–Meier analysis and Cox proportional hazards regression.

Results: The first cluster, representing patients with relatively preserved hemodynamics and functional status, was associated with a more favorable prognosis, while the second cluster included older individuals with significant biventricular dysfunction, higher pulmonary pressures, and poorer exercise capacity. These patients experienced a markedly higher rate of the composite outcome over a median follow-up of 2.4 years, with Cluster 2 showing a significantly increased risk (hazard ratio [HR]: 3.84; 95% CI: 2.72–5.43; p < 0.001).

Conclusion: Machine learning–based clustering revealed two distinct phenotypes in advanced HF with differing clinical features and prognoses. This approach may enhance risk stratification and inform individualized therapeutic strategies in this high-risk population.

Introduction

Heart failure (HF) is a complex clinical syndrome characterized by substantial heterogeneity in etiology, pathophysiology, disease trajectory, and response to therapy. This heterogeneity becomes particularly evident in patients with advanced HF, a population that remains underrepresented in large-scale clinical trials despite experiencing the highest rates of morbidity and mortality (1). The prevalence of this patient group continues to rise due to both an aging global population and the increasing availability of life-prolonging therapies (2). These patients also represent a significant burden on healthcare systems, largely due to frequent hospital readmissions and progressive clinical deterioration (3).

Traditional classifications of heart failure—based on subjective measures of functional status, left ventricular ejection fraction (LVEF) thresholds, or broad stage designations (A to D)—are insufficient to reflect the phenotypic complexity observed in clinical practice (2–4). Recent advances in machine learning (ML) have enabled novel phenotyping strategies, shifting from reductionist models to multidimensional frameworks that incorporate clinical, imaging, and biomarker data (5, 6). In particular, unsupervised learning methods have facilitated the identification of latent subgroups—so-called “phenoclusters”—within heterogeneous HF populations. These data-driven approaches do not rely on pre-labeled outcomes, allowing for the unbiased discovery of previously unrecognized clinical patterns and their prognostic implications (7, 8).

The clinical relevance of phenotypic clustering is increasingly recognized, as subgroups show differing treatment responses and outcomes (9). In heart failure with preserved ejection fraction (HFpEF)—the most extensively studied patient population—phenomapping has identified reproducible clusters linked to comorbidities, structural remodeling, and exercise intolerance (6, 10, 11). However, advanced HF, despite its distinct pathophysiology and poor prognosis, remains underrepresented in such studies (12). The complexity of therapy selection, including transplantation and left ventricular assist device (LVAD), underscores the need for robust stratification models, yet ML applications in this population are still limited.

This study had two main objectives: to identify phenotypic clusters among patients with advanced HF using unsupervised ML techniques, and to assess the prognostic significance of these clusters.

Materials and methods

Study population

A total of 653 consecutive patients with advanced heart failure, defined according to the 2021 European Society of Cardiology (ESC) Guidelines as having persistent severe symptoms (NYHA class III–IV) with objective evidence of cardiac dysfunction and poor prognosis despite optimal medical therapy, and who were referred to our tertiary cardiovascular center for evaluation of advanced therapeutic options (including LVAD and transplantation), were initially evaluated between January 2021 and April 2024 (2). Patients with prior durable LVAD implantation, previous heart transplantation, left ventricular ejection fraction (LVEF) > 25%, severe pulmonary disease, contraindications to CPET or RHC, or incomplete follow-up data were excluded. After applying these exclusion criteria, 524 patients constituted the final study cohort (Supplementary Figure S1). All included patients underwent comprehensive baseline evaluation with transthoracic echocardiography, cardiopulmonary exercise testing (CPET), and right heart catheterization (RHC), performed within a 14-day time window. All demographic, clinical, laboratory, echocardiographic, and hemodynamic variables were obtained from the hospital's electronic medical record (EMR) system. Clinical diagnoses were determined based on International Classification of Diseases (ICD) codes and subsequently verified through physician notes and laboratory reports to ensure accuracy. CPET parameters were extracted through additional manual chart review of exercise test reports by the investigator team. Standardized definitions were applied in line with established guidelines: diabetes mellitus (DM) was defined as a physician-documented diagnosis and/or use of antidiabetic medication (13); atrial fibrillation (AF) as documented arrhythmia on ECG or Holter monitoring (14); ischemic etiology as a history of myocardial infarction, percutaneous coronary intervention, or coronary artery bypass grafting; hypertension (HT) as a physician-documented diagnosis and/or use of antihypertensive therapy (15); hyperlipidemia (HL) as a physician-documented diagnosis and/or use of lipid-lowering therapy; chronic kidney disease (CKD) as an estimated glomerular filtration rate <60 ml/min/1.73 m² persisting for >3 months (16); cerebrovascular disease (CVD) as a history of ischemic or hemorrhagic stroke or transient ischemic attack; and chronic obstructive pulmonary disease (COPD) as a physician-documented chronic airway disease with or without pulmonary function testing.

The study was approved by the local ethics committee and conducted in accordance with the Declaration of Helsinki.

Echocardiography

LVEF was measured using the biplane method of disks summation (modified Simpson's rule). Doppler echocardiographic examinations were performed by a single experienced cardiologist using the EPIQ CVx version 9.0.5 system and both S5-1 and X5-1 transducers (Philips Medical Systems, Andover, MA, USA), in accordance with current guidelines. Tricuspid annular plane systolic excursion (TAPSE) was obtained using M-mode imaging from the apical four-chamber view with focus on the right ventricle. Pulmonary artery systolic pressure (PASP) was estimated by adding the peak tricuspid regurgitant jet velocity (using the Bernoulli equation) to the estimated central venous pressure, which was derived from the diameter and respiratory variation of the inferior vena cava (IVC). All echocardiographic measurements adhered to the recommendations of the American Society of Echocardiography (17).

Exercise testing

Maximal cardiopulmonary exercise testing was performed using a continuous, individualized ramp treadmill protocol on a JAEGER Vyntus CPX system (Vyaire Medical, Germany). Exercise capacity was expressed in metabolic equivalents (METs), with oxygen uptake (VO₂) measured breath by breath through an automated system. Measurements were recorded at rest, throughout graded exercise, and during a two-minute recovery period. METs were calculated by dividing VO₂max by 3.5 ml/kg/min. VO₂, VCO₂, and the respiratory exchange ratio (RER = VCO₂/VO₂) were averaged every 10 s. Peak VO₂ was defined as the highest 10 s averaged VO₂ during the final stage of exercise. Blood pressure was measured prior to testing and at three-minute intervals throughout the protocol and recovery.

Cardiac catheterization

Right heart catheterization was performed via the right internal jugular or femoral vein using a 7Fr balloon-tipped Swan–Ganz catheter (Edwards Lifesciences, Irvine, CA, USA) or a pigtail catheter. Cardiac output was calculated using the indirect Fick method. All pressure waveforms were visually assessed to ensure physiological accuracy, and measurements were taken at end-expiration.

Endpoint definition

The composite outcome was defined as all-cause mortality, LVAD implantation, or heart transplantation, in line with definitions used in previous literature (18, 19).

Statistical analysis

To identify distinct phenotypic clusters within the study population, we employed unsupervised machine learning techniques. Prior to clustering, missing data were addressed via the MissForest algorithm, a non-parametric, iterative imputation method utilizing random forests (20) (Supplementary Figure S2). All continuous variables were standardized to zero mean and unit variance prior to distance-based modeling. Binary categorical variables (e.g., comorbidities, sex) were excluded from the clustering process to prevent distortion in Euclidean distance calculations arising from incompatible data types. Ordinal categorical variables (e.g., mitral regurgitation grade, tricuspid regurgitation grade, and LV diastolic dysfunction) were converted to integer scores respecting their inherent order, thereby preserving their rank information in the distance matrix. A total of 108 variables were considered, encompassing clinical, laboratory, echocardiographic, hemodynamic, and CPET parameters. After addressing multicollinearity (removing one variable from each pair with Pearson correlation >0.7 based on clinical judgment), 81 variables remained for the final clustering analysis (Supplementary Table S1). Both hierarchical clustering (Ward's method with Euclidean distance) and k-means clustering were applied to the scaled numeric data. These algorithms are well suited for standardized continuous data and have been widely applied in heart failure phenomapping studies (5, 6). The optimal number of clusters was determined using both the elbow method (within-cluster sum of squares) and the average silhouette width as complementary approaches (Figure 1). The elbow point was visually identified at k = 2, where the incremental reduction in WSS plateaued, and this was further supported by the highest silhouette score. While k = 3 showed a minor secondary inflection, it yielded a lower silhouette width and produced less stable, clinically interpretable clusters. Hierarchical clustering provided an interpretable dendrogram and stable grouping (Supplementary Figure S3); however, k-means clustering demonstrated comparable or higher silhouette scores, offering more flexible partitioning and iterative refinement (Figures 1, 2). Therefore, k-means clustering (k = 2) was selected for the final classification (Supplementary Figures S4, S5), balancing statistical performance, model simplicity, and clinical interpretability. To evaluate the robustness of the identified clusters, internal validation was performed using bootstrap resampling with 1,000 iterations and Jaccard similarity indices. As a sensitivity analysis, clustering was repeated using Gower distance with partitioning around medoids (PAM) (Supplementary Figure S6). Additionally, internal validation was performed using the Calinski–Harabasz (CH) and Davies–Bouldin (DB) indices across different cluster numbers (k = 2–6) (Supplementary Table S2). The final cluster assignments were appended to the imputed dataset. Group differences between clusters were assessed using chi-squared tests for categorical variables and either Student's t-test or Wilcoxon rank-sum test for continuous variables, depending on distributional assumptions. Scaled variables were compared between the two clusters using both bar plots and a radar chart to illustrate group-level differences (Figure 3). Survival was illustrated using the Kaplan–Meier method, and Cox proportional hazards regression models were applied to assess time-to-event associations between cluster membership and outcomes. Importantly, outcomes were not included as clustering inputs, ensuring independence between phenotype derivation and prognostic evaluation. The proportional hazards assumption was tested using Schoenfeld residuals and was not violated (Supplementary Figure S7). To further assess the reproducibility of the clustering solution, repeated split-sample validation was performed. In each of 100 random replications, the cohort was divided into 70% training and 30% validation subsets. K-means clustering (k = 2) was derived in the training set, and cluster centroids were used to assign patients in the validation set. Agreement between original and validation cluster assignments was quantified by the adjusted Rand index, while prognostic validity was evaluated using log-rank tests and Cox regression (Supplementary Table S3). All statistical tests were two-tailed, and a p-value below 0.05 was considered statistically significant. All statistical analyses were performed using the R 4.4.1 software (R Foundation for Statistical Computing, Vienna, Austria) with packages “missForest”, “dplyr”, “stats”, “cluster”, “clusterCrit”, “fossil”, “naniar”, “dendextend”, “survival”, “survminer”, “rms”, “ggplot2”.

Figure 1

Two line graphs illustrate methods for determining the optimal number of clusters. The left graph shows the Elbow method, with Total Within-Cluster Sum of Squares (WSS) decreasing from 34,000 at one cluster to 26,000 at ten clusters, stabilizing around three clusters. The right graph uses the Silhouette method, with Average Silhouette Width peaking sharply at two clusters and decreasing thereafter.

Figure 1. Determination of the optimal number of clusters (k) using the elbow and silhouette methods. Total within-cluster sum of squares (WSS) plotted against increasing values of k. The elbow point was visually identified at k = 2, where the reduction in WSS began to plateau. Average silhouette width across varying k values, with the highest value observed at k = 2, supporting the two-cluster solution.

Figure 2

Two side-by-side scatter plots show PCA results for clustering with k equals two. The left plot depicts k-means clusters with blue and orange dots, while the right shows hierarchical clusters with green and red dots. Both plots use principal components one and two as axes, with clusters visually separated.

Figure 2. Principal component analysis (PCA) for k-means and hierarchical clustering (k = 2). Visualization of patient distribution by unsupervised clustering (k-means and hierarchical) using first two principal components. Distinct separation is evident between the two clusters in k-means clustering.

Figure 3

Cluster radar and bar chart comparing two groups, Cluster 1 and Cluster 2. The radar chart shows multiple variables with varying levels for each cluster. The bar chart on the right illustrates the scaled values of the same variables for both clusters, with Cluster 1 in red and Cluster 2 in blue.

Figure 3. Cluster radar chart and bar plot comparison,radar chart illustrating normalized distributions of selected parameters across the two clusters. Cluster 2 demonstrated greater impairments in hemodynamic, biochemical, and echocardiographic variables compared with Cluster 1. Bar plot comparing scaled mean values for clinical, echocardiographic, laboratory, and hemodynamic parameters between clusters. Cluster 2 was characterized by older age, higher prevalence of comorbidities (diabetes mellitus, atrial fibrillation), worse hemodynamics (higher RAP, LVEDP, and PVR; lower CI), and impaired functional status (lower peak VO₂).

Results

Cluster validation

Unsupervised k-means clustering identified two distinct phenotypic clusters among 524 patients with advanced HF. For the primary k-means model based on continuous variables, both clusters showed excellent stability (Jaccard indices: 0.998 and 0.985; Supplementary Figure S6). Sensitivity analysis using Gower distance with PAM clustering yielded consistent results, although with moderately lower stability (Jaccard indices: 0.851 and 0.762). Internal validation using the Calinski–Harabasz (CH) and Davies–Bouldin (DB) indices across different cluster numbers (k = 2–6) consistently supported the two-cluster solution, which yielded the highest CH and the lowest DB values (Supplementary Table S2).

Split-sample validation further confirmed reproducibility. Across 100 replications of 70/30 splits, the adjusted Rand index averaged 0.77 ± 0.06, and prognostic separation was consistently observed (log-rank p < 0.05 in all subsets; pooled HR: 0.83, 95% CI: 0.78–0.89; Supplementary Table S3). Notably, this validation HR reflects reproducibility across resampling iterations, whereas the full-cohort Cox model showed the absolute effect size (HR: 3.84, 95% CI: 2.72–5.43).

Collectively, these analyses indicate that the identified phenotypes are reproducible, robust, and prognostically meaningful rather than artifacts of overfitting.

Cluster 1 comprised 282 patients (53.8%), while Cluster 2 included 242 patients (46.2%). Based on their clinical and physiological profiles, we defined Cluster 1 as the Favorable Profile Cluster (FPC), characterized by more favorable hemodynamic and functional parameters—suggestive of a group appropriate for continued monitoring and optimization of standard therapies. In contrast, Cluster 2 was designated the Adverse Profile Cluster (APC), representing an older cohort with marked hemodynamic compromise and diminished exercise capacity, indicative of a phenotype that may benefit from earlier consideration of advanced interventions or intensified medical management.

Clinical and demographic characteristics

Patients in Cluster 2 were older (median age: 54 (45–60) vs. 52 (43–58) years, p = 0.013) and had a lower body mass index (BMI: 27.0 ± 4.8 vs. 28.2 ± 5.3 kg/m², p = 0.005) (Table 1). There was no significant difference in sex distribution between the two clusters. The prevalence of ischemic etiology (54.7% vs. 38.2%, p < 0.001), history of percutaneous coronary intervention (45.9% vs. 28.5%, p < 0.001), coronary artery bypass grafting (14.5% vs. 7.8%, p = 0.015), diabetes mellitus (38.4% vs. 28.5%, p = 0.016), atrial fibrillation (28.1% vs. 10.3%, p < 0.001), and implantable cardioverter defibrillator (31.8% vs. 18.1%, p < 0.001) was significantly higher in Cluster 2.

Table 1

Table 1. Demographic data of patients.

Echocardiographic and hemodynamic findings

Table 2 demonstrates the echocardiographic findings of the patients. Cluster 2 demonstrated more advanced structural and functional cardiac abnormalities. Left ventricular ejection fraction (LVEF) was significantly lower in Cluster 2 (median: 20% (18–24) vs. 23% (20–25), p < 0.001), suggesting more profound systolic dysfunction. Although left ventricular end-diastolic and end-systolic diameters (LVEDD, LVESD) were similar between groups, left atrial size was markedly increased in Cluster 2 (4.84 ± 0.53 cm vs. 4.45 ± 0.61 cm, p < 0.001), reflecting chronic volume overload and diastolic impairment.

Table 2

Table 2. Echocardiographic parameters.

Mitral regurgitation severity was significantly greater in Cluster 2, with higher proportions of patients exhibiting moderate-to-severe regurgitation (Grade 2–3 in 82.6% vs. 47.8%, p < 0.001). Similarly, tricuspid regurgitation was more severe in Cluster 2, indicating substantial right-sided valvular involvement and volume burden.

Right ventricular systolic function was also significantly impaired in Cluster 2, with lower tricuspid annular plane systolic excursion (TAPSE: 1.4 ± 0.37 cm vs. 1.8 ± 0.44 cm, p < 0.001) and increased inferior vena cava (IVC) diameter (2.23 ± 0.43 cm vs. 1.64 ± 0.35 cm, p < 0.001), suggesting elevated right atrial pressures and reduced RV contractility. Plethora was observed in over half of Cluster 2 patients (55.8% vs. 2.3%, p < 0.001). Estimated pulmonary artery systolic pressure (PASP) by echocardiography was higher in Cluster 2 (median: 50 mmHg vs. 30 mmHg, p < 0.001), consistent with pulmonary hypertension.

Invasive hemodynamic assessment via right heart catheterization revealed marked elevation in biventricular filling pressures and pulmonary vascular resistance in Cluster 2 (Table 3). Left ventricular end-diastolic pressure (LVEDP), along with pulmonary artery systolic, diastolic, and mean pressures were significantly higher in Cluster 2 than Cluster 1. Cluster 2 also exhibited elevated right atrial pressure (RAP: 12 (9–17) vs. 6 (4–8) mmHg, p < 0.001), right ventricular systolic pressure (RVSP: 59 (48–71) vs. 36 (29–49) mmHg, p < 0.001), and transpulmonary gradient (TPG: 12 (8–19) vs. 6 (3–9) mmHg, p < 0.001) than Cluster 1, indicating more frequent combined pre- and post-capillary pulmonary hypertension in Cluster 2.

Table 3

Table 3. Cardiac catheterization parameters.

Cardiac output (CO) and cardiac index (CI) were significantly reduced in Cluster 2 than Cluster 1, reflecting diminished global perfusion capacity. Pulmonary vascular resistance (PVR) was notably elevated (4.1 (2.56–6.4) vs. 1.4 (0.9–2.33) Wood units, p < 0.001), while systemic vascular resistance (SVR) was also modestly higher (24.2 (20.2–29.4) vs. 21.4 (17.6–29.4) Wood units, p < 0.001). Additionally, stroke volume (SV) and stroke volume index (SVI) were significantly lower in Cluster 2, consistent with advanced circulatory compromise.

Collectively, these findings underscore a more severe biventricular phenotype in Cluster 2, characterized by pronounced systolic dysfunction, elevated filling pressures, secondary valvular disease, and significant pulmonary hypertension.

Laboratory parameters and biomarkers

Patients in Cluster 2 exhibited a laboratory profile consistent with advanced disease severity, multiorgan involvement, and worse nutritional and metabolic status (Table 4). Serum urea levels were higher in Cluster 2, yet serum creatinine levels similar in both groups. Hepatic congestion and dysfunction were more prominent in Cluster 2, with significantly elevated total (1.08 (0.72–1.58) vs. 0.58 (0.41–0.81) mg/dl, p < 0.001) and direct bilirubin levels (0.51 (0.31–0.84) vs. 0.21 (0.15–0.30) mg/dl, p < 0.001), as well as higher GGT (58.5 (30.6–103) vs. 28 (18–44) U/L, p < 0.001) and ALP (100 (71–131) vs. 87 (71.5–105) U/L, p = 0.001).

Table 4

Table 4. Blood parameters of patients.

NT-proBNP levels were nearly threefold higher in Cluster 2 compared to Cluster 1 (3,969 (2,441–6,595) vs. 1,330 (562–2,241) ng/L, p < 0.001), indicating greater myocardial wall stress and hemodynamic overload.

Markers of nutritional status showed significant deterioration in Cluster 2. Serum albumin (41.2 ± 5.79 vs. 44.9 ± 4.32 g/L, p < 0.001), total protein (69.4 ± 8.07 vs. 72.0 ± 6.35 g/L, p < 0.001), and HDL cholesterol levels [34.8 (28.1–43.3) vs. 42 (36.2–50.5) mg/dl, p < 0.001] were significantly lower, suggesting poor nutritional state and reduced hepatic synthetic function.

Hematologic findings were indicative of more pronounced anemia in Cluster 2. Both hemoglobin (13.0 ± 2.09 vs. 14.4 ± 1.65 g/dl, p < 0.001) and hematocrit levels (41.2 ± 6.02% vs. 43.9 ± 4.73%, p < 0.001) were significantly lower compared to Cluster 1, suggesting impaired oxygen-carrying capacity and potential chronic disease-related anemia.

Cardiopulmonary exercise test (CPET) performance

Cluster 2 patients demonstrated significantly reduced functional capacity across multiple CPET parameters, consistent with more advanced heart failure physiology (Table 5). Peak oxygen consumption (peak VO₂) was markedly lower in Cluster 2 [10.7 (9–13.2) vs. 16.0 (13.3–18.7) ml/kg/min, p < 0.001], reflecting impaired aerobic capacity and cardiac output reserve. Similarly, the achieved metabolic equivalents (METS) were significantly reduced [3.1 (2.6–3.8) vs. 4.6 (3.8–5.4), p < 0.001], indicating diminished ability to perform physical activity.

Table 5

Table 5. Cardiopulmonary exercise test parameters of patients.

Ventilatory efficiency was also substantially worse in Cluster 2, as demonstrated by a significantly elevated VE/VCO₂ slope [49.1 (37.6–81.0) vs. 33.5 (29.2–39.8), p < 0.001]. Moreover, lower peak exercise oxygen pulse and VO₂/work slope values in this group (both p < 0.001) further support compromised cardiovascular performance and peripheral oxygen extraction.

Outcomes and survival analysis

Over a median follow-up of 2.4 years (interquartile range: 1.4–4.1), the incidence of the composite endpoint was significantly higher in Cluster 2 compared to Cluster 1 (50.0% vs. 15.6%, p < 0.001), highlighting the adverse prognostic profile of this subgroup (Table 6). In Cox regression analysis, assignment to Cluster 2 was associated with a 3.84-fold increased risk of experiencing the composite endpoint (hazard ratio [HR]: 3.84; 95% confidence interval [CI]: 2.72–5.43; p < 0.001) (Table 7, Figure 4).

Table 6

Table 6. Clinical outcomes (LVAD implantation, heart transplantation, and death) according to phenotypic clusters.

Table 7

Table 7. Cox regression analysis.

Figure 4

Kaplan-Meier survival curves show two clusters over 60 months. Cluster 1, in blue, has higher survival probability than Cluster 2, in red. Shaded areas indicate confidence intervals. Log-rank test p-value is less than 0.0001, indicating significant difference. Number at risk decreases over time for both clusters.

Figure 4. Kaplan–meier survival curves. Kaplan–Meier estimates for the composite outcome (all-cause mortality, LVAD, or transplantation) stratified by cluster assignment. Cluster 2 demonstrated significantly lower event-free survival (log-rank p < 0.0001).

Discussion

In this study, we identified two distinct phenotypic clusters—the FPC and the APC- within an advanced HF population by applying unsupervised ML to a comprehensive and multimodal dataset. These clusters exhibited significant differences in clinical profiles and were associated with long-term outcomes, underscoring their prognostic relevance.

Recent studies by Zhang et al. and Yao et al. have applied supervised ML methods to predict the need for advanced HF therapies (18, 21). Zhang et al. used data from 557 hospitalizations to develop a transparent, rule-based ML model that predicted which patients would require advanced heart failure therapies, such as LVAD or transplantation, during follow-up (18). Yao et al. introduced a novel ML framework that grouped clinical variables into flexible, overlapping categories—allowing a patient to belong partially to more than one risk group, rather than being forced into a single predefined class. This approach, inspired by fuzzy logic, better reflects the clinical continuum and supports the derivation of interpretable reasoning rules. In their pilot application, the method was tested on a real-world cohort of patients evaluated for advanced heart failure therapies, demonstrating its potential to support eligibility classification through a rule-based, interpretable design (21). While these models offer interpretable decision support, they require labeled outcomes and focus on specific treatment decisions. In contrast, our approach aimed to identify clinically meaningful phenotypes associated with prognosis, rather than just treatment eligibility. This methodology offers a broader view of clinical heterogeneity in advanced HF.

Lamp et al. used unsupervised clustering to stratify patients into five risk categories based on a composite outcome of death, LVAD implantation, transplantation, and rehospitalization over six months, and subsequently applied supervised modeling to predict these outcomes (19). Their interpretable model was trained using two distinct input sets: the invasive set, which included variables derived solely from right heart catheterization (e.g., right atrial pressure, pulmonary artery pressures, cardiac output), and the all-feature set, which combined invasive hemodynamic data with a wider range of clinical and laboratory variables. The model achieved high predictive performance, with c-statistics ranging from 0.896 to 0.969 for the invasive set, and 0.858 to 0.997 for the all-feature set, although confidence intervals were not explicitly reported. Despite the impressive discrimination, their analysis was primarily limited to hemodynamic domains. In contrast, our approach integrated a more comprehensive set of variables—including echocardiographic, cardiopulmonary exercise testing (CPET), biochemical, and invasive hemodynamic parameters—allowing for a multidimensional phenotypic characterization with prognostic relevance.

The concept of phenotyping HF patients has emerged from the need for personalized treatment. In heart failure with reduced ejection fraction (HFrEF), applying and titrating guideline-directed therapies can be challenging due to comorbidities and adverse effects on blood pressure, renal function, and electrolyte balance (22). As the HF population ages, comorbidity burden increases, reducing the feasibility of uniform treatment (“one size fit all”) approaches. Similar challenges exist in HFpEF, where pharmacological therapies have generally failed to show mortality benefit in randomized large scale trials (23–25). However, ML-based clustering studies have revealed subgroups with variable treatment responses, supporting the role of precision medicine in this heterogeneous population (26, 27).

ML has gained attraction in HF research for its capacity to manage high-dimensional data and uncover latent phenotypes not captured by traditional statistics (5, 7). Beyond providing therapeutic guidance, it also aids in risk stratification, as demonstrated by a series of analyses that applied supervised machine learning techniques to predict short- and long-term mortality with high accuracy (5, 28, 29).

Patients with advanced heart failure represent the terminal stage of the disease and often present with overlapping symptoms and complex pathophysiology (3). Although machine learning–based clustering has previously been applied in advanced heart failure, prior studies were limited by narrower variable sets, often focusing predominantly on hemodynamics or clinical data. Our study is the first to integrate a comprehensive multimodal dataset—including echocardiographic, cardiopulmonary exercise test, and invasive hemodynamic parameters—allowing a more detailed phenotypic characterization and prognostic stratification in this high-risk population. In our study, the two identified clusters—the FPC and the APC—demonstrated distinct clinical profiles (Figure 5). FPC encompassed individuals with relatively preserved hemodynamic and functional status, whereas APC represented a cohort with marked physiological deterioration and higher disease burden. These contrasting profiles were associated with markedly different long-term outcomes. Patients in the FPC group may benefit from routine follow-up and medical optimization, whereas those in the APC group may require early evaluation for advanced therapies, including inotropic support, mechanical circulatory support, or palliative care planning.

Figure 5

Figure 5. Phenotypic summary of machine learning–derived clusters in advanced HF.

Integrating clustering outputs into clinical workflows could enable timely recognition of high-risk phenotypes and support personalized treatment strategies. As multimorbidity becomes more prevalent in HF populations, incorporating comorbidity profiles into clustering models may enhance patient stratification. Future studies should aim to externally validate these clusters and evaluate their applicability in prospective cohorts.

Limitations

Despite the strengths of our study, several important limitations should be acknowledged. First, the sample size, although relatively large for a single-center advanced HF cohort, remains modest. Second, our cohort was predominantly male, a pattern commonly observed in advanced HF studies; nevertheless, this sex imbalance may restrict the generalizability of our findings to female patients. Third, the retrospective and observational design precludes causal inference. Fourth, although the dataset was comprehensive, it reflects a single-center experience, which may limit external validity and generalizability. While we performed repeated split-sample validation within our cohort to strengthen internal reproducibility, external validation in larger, prospective multicenter cohorts will be essential to confirm the generalizability of our findings. Fifth, binary clinical variables (e.g., sex, diabetes, atrial fibrillation, ischemic etiology) were excluded from the clustering input for methodological reasons, potentially limiting completeness of phenotyping.

Conclusion

This study shows that unsupervised ML-based clustering can reveal clinically important phenotypes in advanced HF using routinely collected multimodal data. The identification of two distinct clusters with differing clinical profiles and outcomes highlights the potential of data-driven approaches to enhance risk stratification and guide personalized care. Prospective validation is warranted to confirm clinical utility.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by Kartal Kosuyolu Research and Education Hospital Clinical Research Ethics Committee. The studies were conducted in accordance with the local legislation and institutional requirements. The ethics committee/institutional review board waived the requirement of written informed consent for participation from the participants or the participants' legal guardians/next of kin because of the retrospective design of the study and the use of fully anonymized patient data.

Author contributions

MK: Conceptualization, Data curation, Project administration, Writing – original draft. BK: Formal analysis, Methodology, Writing – review & editing. DM: Formal analysis, Supervision, Writing – review & editing. ST: Data curation, Writing – review & editing. AK: Data curation, Investigation, Resources, Writing – review & editing. SE: Conceptualization, Resources, Visualization, Writing – review & editing. CD: Visualization, Writing – review & editing. GH: Conceptualization, Writing – review & editing. ÖA: Conceptualization, Writing – review & editing. KK: Methodology, Supervision, Writing – review & editing. RA: Investigation, Supervision, Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fcvm.2025.1669538/full#supplementary-material

Supplementary Figure S1 | Flowchart of the study population. Flow diagram showing patient selection. Of 653 patients evaluated at the advanced HF clinic, 129 were excluded (prior LVAD/HTx, preserved LVEF, severe pulmonary disease, contraindications to CPET/RHC, or incomplete follow-up). The final cohort included 524 patients, who were classified into two clusters (Cluster 1, n=282; Cluster 2, n=242).

Supplementary Figure S2 | Missing data map. Visualization of missingness across all variables in the study cohort. Overall, 8.3% of values were missing, while 91.7% were present. The plot highlights variable- and patient-level distribution of missing data.

Supplementary Figure S3 | Determination of the optimal number of clusters for hierarchical clustering. (Left) Elbow method: total within-cluster sum of squares plotted against increasing k values, with an inflection observed at k=2. (Right) Silhouette method: average silhouette width across k, peaking at k=2, supporting the choice of a two-cluster solution.

Supplementary Figure S4 | PCA visualization of k-means clusters. Visualization of the two clusters identified using k-means clustering after dimensionality reduction with PCA.

Supplementary Figure S5 | t-SNE visualization of k-means clusters. t-Distributed Stochastic Neighbor Embedding (t-SNE) plot depicting patient clustering based on high-dimensional input data. Clear distinction observed between Cluster 1 and Cluster 2.

Supplementary Figure S6 | Cluster stability assessed by Jaccard similarity indices. Resampling-based stability analysis for the primary k-means model using continuous variables demonstrated excellent reproducibility of both clusters (Jaccard indices: 0.998 and 0.985). Sensitivity analysis with Gower distance and PAM clustering showed consistent cluster structures with moderately lower stability (Jaccard indices: 0.851 and 0.762), supporting the robustness of the identified phenotypes.

Supplementary Figure S7 | Schoenfeld residuals for proportional hazards assumption. Plots of scaled Schoenfeld residuals over time for covariates included in the Cox regression models. No systematic trends were observed, indicating that the proportional hazards assumption was not violated.

References

1. Dunlay SM, Roger VL, Killian JM, Weston SA, Schulte PJ, Subramaniam AV, et al. Advanced heart failure epidemiology and outcomes: a population-based study. JACC Heart Fail. (2021) 9(10):722–32. doi: 10.1016/j.jchf.2021.05.009

PubMed Abstract | Crossref Full Text | Google Scholar

2. McDonagh TA, Metra M, Adamo M, Gardner RS, Baumbach A, Böhm M, et al. 2021 ESC guidelines for the diagnosis and treatment of acute and chronic heart failure. Eur Heart J. (2021) 42(36):3599–726. doi: 10.1093/eurheartj/ehab368

PubMed Abstract | Crossref Full Text | Google Scholar

3. Heidenreich PA, Bozkurt B, Aguilar D, Allen LA, Byun JJ, Colvin MM, et al. 2022 AHA/ACC/HFSA guideline for the management of heart failure: a report of the American College of Cardiology/American Heart Association joint committee on clinical practice guidelines. Circulation. (2022) 145(18):e895–1032. doi: 10.1161/CIR.0000000000001063

PubMed Abstract | Crossref Full Text | Google Scholar

4. Ahmad T, Pencina MJ, Schulte PJ, O'Brien E, Whellan DJ, Piña IL, et al. Clinical implications of chronic heart failure phenotypes defined by cluster analysis. J Am Coll Cardiol. (2014) 64(17):1765–74. doi: 10.1016/j.jacc.2014.07.979

PubMed Abstract | Crossref Full Text | Google Scholar

5. Ahmad T, Lund LH, Rao P, Ghosh R, Warier P, Vaccaro B, et al. Machine learning methods improve prognostication, identify clinically distinct phenotypes, and detect heterogeneity in response to therapy in a large cohort of heart failure patients. J Am Heart Assoc. (2018) 7(8):e008081. doi: 10.1161/JAHA.117.008081

PubMed Abstract | Crossref Full Text | Google Scholar

6. Shah SJ, Katz DH, Selvaraj S, Burke MA, Yancy CW, Gheorghiade M, et al. Phenomapping for novel classification of heart failure with preserved ejection fraction. Circulation. (2015) 131(3):269–79. doi: 10.1161/CIRCULATIONAHA.114.010637

PubMed Abstract | Crossref Full Text | Google Scholar

7. Meijs C, Handoko ML, Savarese G, Vernooij RWM, Vaartjes I, Banerjee A, et al. Discovering distinct phenotypical clusters in heart failure across the ejection fraction Spectrum: a systematic review. Curr Heart Fail Rep. (2023) 20(5):333–49. doi: 10.1007/s11897-023-00615-z

PubMed Abstract | Crossref Full Text | Google Scholar

8. Zhou X, Nakamura K, Sahara N, Asami M, Toyoda Y, Enomoto Y, et al. Exploring and identifying prognostic phenotypes of patients with heart failure guided by explainable machine learning. Life. (2022) 12(6):776. doi: 10.3390/life12060776

PubMed Abstract | Crossref Full Text | Google Scholar

9. van de Veerdonk MC, Savarese G, Handoko ML, Beulens JWJ, Asselbergs F, Uijl A. Multimorbidity in heart failure: leveraging cluster analysis to guide tailored treatment strategies. Curr Heart Fail Rep. (2023) 20(5):461–70. doi: 10.1007/s11897-023-00626-w

PubMed Abstract | Crossref Full Text | Google Scholar

10. Kaur P, Ha J, Raye N, Ouwerkerk W, van Essen BJ, Tan L, et al. A systematic review of multimorbidity clusters in heart failure: effects of methodologies. Int J Cardiol. (2025) 420:132748. doi: 10.1016/j.ijcard.2024.132748

PubMed Abstract | Crossref Full Text | Google Scholar

11. Ahmad FS, Luo Y, Wehbe RM, Thomas JD, Shah SJ. Advances in machine learning approaches to heart failure with preserved ejection fraction. Heart Fail Clin. (2022) 18(2):287–300. doi: 10.1016/j.hfc.2021.12.002

PubMed Abstract | Crossref Full Text | Google Scholar

12. Al-Ani MA, Bai C, Hashky A, Parker AM, Vilaro JR, Aranda JM Jr, et al. Artificial intelligence guidance of advanced heart failure therapies: a systematic scoping review. Front Cardiovasc Med. (2023) 10:1127716. doi: 10.3389/fcvm.2023.1127716

PubMed Abstract | Crossref Full Text | Google Scholar

13. ElSayed NA, Aleppo G, Aroda VR, Bannuru RR, Brown FM, Bruemmer D, et al. 2. Classification and diagnosis of diabetes: standards of care in diabetes-2023. Diabetes Care. (2023) 46(Suppl 1):S19–40. doi: 10.2337/dc23-S002

PubMed Abstract | Crossref Full Text | Google Scholar

14. Van Gelder IC, Rienstra M, Bunting KV, Casado-Arroyo R, Caso V, Crijns HJGM, et al. 2024 ESC guidelines for the management of atrial fibrillation developed in collaboration with the European association for cardio-thoracic surgery (EACTS). Eur Heart J. (2024) 45(36):3314–414. doi: 10.1093/eurheartj/ehae176

PubMed Abstract | Crossref Full Text | Google Scholar

15. McEvoy JW, McCarthy CP, Bruno RM, Brouwers S, Canavan MD, Ceconi C, et al. 2024 ESC guidelines for the management of elevated blood pressure and hypertension. Eur Heart J. (2024) 45(38):3912–4018. doi: 10.1093/eurheartj/ehae178

PubMed Abstract | Crossref Full Text | Google Scholar

16. Kidney Disease: Improving Global Outcomes (KDIGO) CKD Work Group. KDIGO 2024 clinical practice guideline for the evaluation and management of chronic kidney disease. Kidney Int. (2024) 105(4S):S117–314. doi: 10.1016/j.kint.2023.10.018

PubMed Abstract | Crossref Full Text | Google Scholar

17. Mitchell C, Rahko PS, Blauwet LA, Canaday B, Finstuen JA, Foster MC, et al. Guidelines for performing a comprehensive transthoracic echocardiographic examination in adults: recommendations from the American society of echocardiography. J Am Soc Echocardiogr. (2019) 32(1):1–64. doi: 10.1016/j.echo.2018.06.004

PubMed Abstract | Crossref Full Text | Google Scholar

18. Zhang Y, Aaronson KD, Gryak J, Wittrup E, Minoccheri C, Golbus JR, et al. Predicting need for heart failure advanced therapies using an interpretable tropical geometry-based fuzzy neural network. PLoS One. (2023) 18(11):e0295016. doi: 10.1371/journal.pone.0295016

PubMed Abstract | Crossref Full Text | Google Scholar

19. Lamp J, Wu Y, Lamp S, Afriyie P, Ashur N, Bilchick K, et al. Characterizing advanced heart failure risk and hemodynamic phenotypes using interpretable machine learning. Am Heart J. (2024) 271:1–11. doi: 10.1016/j.ahj.2024.02.001

PubMed Abstract | Crossref Full Text | Google Scholar

20. Stekhoven DJ, Bühlmann P. Missforest–non-parametric missing value imputation for mixed-type data. Bioinformatics. (2012) 28(1):112–8. doi: 10.1093/bioinformatics/btr597

PubMed Abstract | Crossref Full Text | Google Scholar

21. Yao H, Derksen H, Golbus JR, Zhang J, Aaronson KD, Gryak J, et al. A novel tropical geometry-based interpretable machine learning method: pilot application to delivery of advanced heart failure therapies. IEEE J Biomed Health Inform. (2023) 27(1):239–50. doi: 10.1109/JBHI.2022.3211765

PubMed Abstract | Crossref Full Text | Google Scholar

22. Rosano GMC, Moura B, Metra M, Böhm M, Bauersachs J, Ben Gal T, et al. Patient profiling in heart failure for tailoring medical therapy. A consensus document of the heart failure association of the European Society of Cardiology. Eur J Heart Fail. (2021) 23(6):872–81. doi: 10.1002/ejhf.2206

PubMed Abstract | Crossref Full Text | Google Scholar

23. Yusuf S, Pfeffer MA, Swedberg K, Granger CB, Held P, McMurray JJ, et al. Effects of candesartan in patients with chronic heart failure and preserved left-ventricular ejection fraction: the CHARM-preserved trial. Lancet. (2003) 362(9386):777–81. doi: 10.1016/S0140-6736(03)14285-7

PubMed Abstract | Crossref Full Text | Google Scholar

24. Solomon SD, McMurray JJV, Anand IS, Ge J, Lam CSP, Maggioni AP, et al. Angiotensin-Neprilysin inhibition in heart failure with preserved ejection fraction. N Engl J Med. (2019) 381(17):1609–20. doi: 10.1056/NEJMoa1908655

PubMed Abstract | Crossref Full Text | Google Scholar

25. Pitt B, Pfeffer MA, Assmann SF, Boineau R, Anand IS, Claggett B, et al. Spironolactone for heart failure with preserved ejection fraction. N Engl J Med. (2014) 370(15):1383–92. doi: 10.1056/NEJMoa1313731

PubMed Abstract | Crossref Full Text | Google Scholar

26. Peters AE, Tromp J, Shah SJ, Lam CSP, Lewis GD, Borlaug BA, et al. Phenomapping in heart failure with preserved ejection fraction: insights, limitations, and future directions. Cardiovasc Res. (2023) 118(18):3403–15. doi: 10.1093/cvr/cvac179

PubMed Abstract | Crossref Full Text | Google Scholar

27. Sotomi Y, Hikoso S, Nakatani D, Okada K, Dohi T, Sunaga A, et al. Medications for specific phenotypes of heart failure with preserved ejection fraction classified by a machine learning-based clustering model. Heart. (2023) 109(16):1231–40. doi: 10.1136/heartjnl-2022-322181

PubMed Abstract | Crossref Full Text | Google Scholar

28. Kwon JM, Kim KH, Jeon KH, Lee SE, Lee HY, Cho HJ, et al. Artificial intelligence algorithm for predicting mortality of patients with acute heart failure. PLoS One. (2019) 14(7):e0219302. doi: 10.1371/journal.pone.0219302

PubMed Abstract | Crossref Full Text | Google Scholar

29. Samad MD, Ulloa A, Wehner GJ, Jing L, Hartzel D, Good CW, et al. Predicting survival from large echocardiography and electronic health record datasets: optimization with machine learning. JACC Cardiovasc Imaging. (2019) 12(4):681–9. doi: 10.1016/j.jcmg.2018.04.026

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: advanced heart failure, phenotyping, unsupervised clustering, machine learning, risk stratification

Citation: Karaçam M, Kültürsay Barkın, Mutlu D, Tanyeri S, Kaya A, Efe Süleyman Çagan, Doğan C, Halil Gülümser Sevgin, Akbal Özgür Yaşar, Kırali K and Acar RD (2025) From patterns to prognosis: machine learning–derived clusters in advanced heart failure. Front. Cardiovasc. Med. 12:1669538. doi: 10.3389/fcvm.2025.1669538

Received: 19 July 2025; Accepted: 7 October 2025;
Published: 23 October 2025.

Edited by:

Ricardo Mourilhe-Rocha, Rio de Janeiro State University, Brazil

Reviewed by:

Erick Romero, UC Davis Medical Center, United States
Elisa Rauseo, NIHR Barts Cardiovascular Biomedical Research Unit, United Kingdom

Copyright: © 2025 Karaçam, Kültürsay, Mutlu, Tanyeri, Kaya, Efe, Doğan, Halil, Akbal, Kırali and Acar. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Murat Karaçam, bXJ0a3JjbS41QGdtYWlsLmNvbQ==

^†ORCID:
Murat Karaçam
orcid.org/0000-0001-7323-8843
Barkın Kültürsay
orcid.org/0000-0002-1424-2209
Deniz Mutlu
orcid.org/0000-0003-4432-4595
Seda Tanyeri
orcid.org/0000-0002-0933-9233
Azmican Kaya
orcid.org/0009-0005-8935-3308
Süleyman Çagan Efe
orcid.org/0000-0002-6067-6841
Cem Doğan
orcid.org/0000-0002-2004-142X
Gülümser Sevgin Halil
orcid.org/0000-0003-0412-5292
Özgür Yaşar Akbal
orcid.org/0000-0002-3882-0288
Kaan Kırali
orcid.org/0000-0003-0044-4691
Rezzan Deniz Acar
orcid.org/0000-0003-1870-4527

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.