- 1Department of Cardiology, Bitlis State Hospital, Bitlis, Türkiye
- 2Department of Cardiology, Tunceli State Hospital, Tunceli, Türkiye
- 3Center for Coronary Artery Disease, Minneapolis Heart Institute Foundation, Minneapolis, MN, United States
- 4Department of Cardiology, Kartal Kosuyolu Research and Education Hospital, Istanbul, Türkiye
- 5Department of Cardiovascular Surgery, Kartal Kosuyolu Research and Education Hospital, İstanbul, Türkiye
Introduction: Advanced heart failure (HF) is a clinically heterogeneous condition with poor prognosis, and traditional classification systems often fail to capture the complexity needed for personalized care. This study aimed to identify clinically meaningful phenotypic subgroups among patients with advanced HF using unsupervised machine learning and to evaluate their association with long-term outcomes.
Methods: A retrospective analysis was conducted on 524 patients with advanced HF who underwent comprehensive clinical, echocardiographic, hemodynamic, and cardiopulmonary exercise assessments. Using k-means clustering on standardized, multidimensional data, two distinct phenotypes were identified. The primary composite outcome was defined as all-cause mortality, left ventricular assist device implantation, or heart transplantation. Associations between cluster assignment and outcomes were evaluated using Kaplan–Meier analysis and Cox proportional hazards regression.
Results: The first cluster, representing patients with relatively preserved hemodynamics and functional status, was associated with a more favorable prognosis, while the second cluster included older individuals with significant biventricular dysfunction, higher pulmonary pressures, and poorer exercise capacity. These patients experienced a markedly higher rate of the composite outcome over a median follow-up of 2.4 years, with Cluster 2 showing a significantly increased risk (hazard ratio [HR]: 3.84; 95% CI: 2.72–5.43; p < 0.001).
Conclusion: Machine learning–based clustering revealed two distinct phenotypes in advanced HF with differing clinical features and prognoses. This approach may enhance risk stratification and inform individualized therapeutic strategies in this high-risk population.
Introduction
Heart failure (HF) is a complex clinical syndrome characterized by substantial heterogeneity in etiology, pathophysiology, disease trajectory, and response to therapy. This heterogeneity becomes particularly evident in patients with advanced HF, a population that remains underrepresented in large-scale clinical trials despite experiencing the highest rates of morbidity and mortality (1). The prevalence of this patient group continues to rise due to both an aging global population and the increasing availability of life-prolonging therapies (2). These patients also represent a significant burden on healthcare systems, largely due to frequent hospital readmissions and progressive clinical deterioration (3).
Traditional classifications of heart failure—based on subjective measures of functional status, left ventricular ejection fraction (LVEF) thresholds, or broad stage designations (A to D)—are insufficient to reflect the phenotypic complexity observed in clinical practice (2–4). Recent advances in machine learning (ML) have enabled novel phenotyping strategies, shifting from reductionist models to multidimensional frameworks that incorporate clinical, imaging, and biomarker data (5, 6). In particular, unsupervised learning methods have facilitated the identification of latent subgroups—so-called “phenoclusters”—within heterogeneous HF populations. These data-driven approaches do not rely on pre-labeled outcomes, allowing for the unbiased discovery of previously unrecognized clinical patterns and their prognostic implications (7, 8).
The clinical relevance of phenotypic clustering is increasingly recognized, as subgroups show differing treatment responses and outcomes (9). In heart failure with preserved ejection fraction (HFpEF)—the most extensively studied patient population—phenomapping has identified reproducible clusters linked to comorbidities, structural remodeling, and exercise intolerance (6, 10, 11). However, advanced HF, despite its distinct pathophysiology and poor prognosis, remains underrepresented in such studies (12). The complexity of therapy selection, including transplantation and left ventricular assist device (LVAD), underscores the need for robust stratification models, yet ML applications in this population are still limited.
This study had two main objectives: to identify phenotypic clusters among patients with advanced HF using unsupervised ML techniques, and to assess the prognostic significance of these clusters.
Materials and methods
Study population
A total of 653 consecutive patients with advanced heart failure, defined according to the 2021 European Society of Cardiology (ESC) Guidelines as having persistent severe symptoms (NYHA class III–IV) with objective evidence of cardiac dysfunction and poor prognosis despite optimal medical therapy, and who were referred to our tertiary cardiovascular center for evaluation of advanced therapeutic options (including LVAD and transplantation), were initially evaluated between January 2021 and April 2024 (2). Patients with prior durable LVAD implantation, previous heart transplantation, left ventricular ejection fraction (LVEF) > 25%, severe pulmonary disease, contraindications to CPET or RHC, or incomplete follow-up data were excluded. After applying these exclusion criteria, 524 patients constituted the final study cohort (Supplementary Figure S1). All included patients underwent comprehensive baseline evaluation with transthoracic echocardiography, cardiopulmonary exercise testing (CPET), and right heart catheterization (RHC), performed within a 14-day time window. All demographic, clinical, laboratory, echocardiographic, and hemodynamic variables were obtained from the hospital's electronic medical record (EMR) system. Clinical diagnoses were determined based on International Classification of Diseases (ICD) codes and subsequently verified through physician notes and laboratory reports to ensure accuracy. CPET parameters were extracted through additional manual chart review of exercise test reports by the investigator team. Standardized definitions were applied in line with established guidelines: diabetes mellitus (DM) was defined as a physician-documented diagnosis and/or use of antidiabetic medication (13); atrial fibrillation (AF) as documented arrhythmia on ECG or Holter monitoring (14); ischemic etiology as a history of myocardial infarction, percutaneous coronary intervention, or coronary artery bypass grafting; hypertension (HT) as a physician-documented diagnosis and/or use of antihypertensive therapy (15); hyperlipidemia (HL) as a physician-documented diagnosis and/or use of lipid-lowering therapy; chronic kidney disease (CKD) as an estimated glomerular filtration rate <60 ml/min/1.73 m2 persisting for >3 months (16); cerebrovascular disease (CVD) as a history of ischemic or hemorrhagic stroke or transient ischemic attack; and chronic obstructive pulmonary disease (COPD) as a physician-documented chronic airway disease with or without pulmonary function testing.
The study was approved by the local ethics committee and conducted in accordance with the Declaration of Helsinki.
Echocardiography
LVEF was measured using the biplane method of disks summation (modified Simpson's rule). Doppler echocardiographic examinations were performed by a single experienced cardiologist using the EPIQ CVx version 9.0.5 system and both S5-1 and X5-1 transducers (Philips Medical Systems, Andover, MA, USA), in accordance with current guidelines. Tricuspid annular plane systolic excursion (TAPSE) was obtained using M-mode imaging from the apical four-chamber view with focus on the right ventricle. Pulmonary artery systolic pressure (PASP) was estimated by adding the peak tricuspid regurgitant jet velocity (using the Bernoulli equation) to the estimated central venous pressure, which was derived from the diameter and respiratory variation of the inferior vena cava (IVC). All echocardiographic measurements adhered to the recommendations of the American Society of Echocardiography (17).
Exercise testing
Maximal cardiopulmonary exercise testing was performed using a continuous, individualized ramp treadmill protocol on a JAEGER Vyntus CPX system (Vyaire Medical, Germany). Exercise capacity was expressed in metabolic equivalents (METs), with oxygen uptake (VO2) measured breath by breath through an automated system. Measurements were recorded at rest, throughout graded exercise, and during a two-minute recovery period. METs were calculated by dividing VO2max by 3.5 ml/kg/min. VO2, VCO2, and the respiratory exchange ratio (RER = VCO2/VO2) were averaged every 10 s. Peak VO2 was defined as the highest 10 s averaged VO2 during the final stage of exercise. Blood pressure was measured prior to testing and at three-minute intervals throughout the protocol and recovery.
Cardiac catheterization
Right heart catheterization was performed via the right internal jugular or femoral vein using a 7Fr balloon-tipped Swan–Ganz catheter (Edwards Lifesciences, Irvine, CA, USA) or a pigtail catheter. Cardiac output was calculated using the indirect Fick method. All pressure waveforms were visually assessed to ensure physiological accuracy, and measurements were taken at end-expiration.
Endpoint definition
The composite outcome was defined as all-cause mortality, LVAD implantation, or heart transplantation, in line with definitions used in previous literature (18, 19).
Statistical analysis
To identify distinct phenotypic clusters within the study population, we employed unsupervised machine learning techniques. Prior to clustering, missing data were addressed via the MissForest algorithm, a non-parametric, iterative imputation method utilizing random forests (20) (Supplementary Figure S2). All continuous variables were standardized to zero mean and unit variance prior to distance-based modeling. Binary categorical variables (e.g., comorbidities, sex) were excluded from the clustering process to prevent distortion in Euclidean distance calculations arising from incompatible data types. Ordinal categorical variables (e.g., mitral regurgitation grade, tricuspid regurgitation grade, and LV diastolic dysfunction) were converted to integer scores respecting their inherent order, thereby preserving their rank information in the distance matrix. A total of 108 variables were considered, encompassing clinical, laboratory, echocardiographic, hemodynamic, and CPET parameters. After addressing multicollinearity (removing one variable from each pair with Pearson correlation >0.7 based on clinical judgment), 81 variables remained for the final clustering analysis (Supplementary Table S1). Both hierarchical clustering (Ward's method with Euclidean distance) and k-means clustering were applied to the scaled numeric data. These algorithms are well suited for standardized continuous data and have been widely applied in heart failure phenomapping studies (5, 6). The optimal number of clusters was determined using both the elbow method (within-cluster sum of squares) and the average silhouette width as complementary approaches (Figure 1). The elbow point was visually identified at k = 2, where the incremental reduction in WSS plateaued, and this was further supported by the highest silhouette score. While k = 3 showed a minor secondary inflection, it yielded a lower silhouette width and produced less stable, clinically interpretable clusters. Hierarchical clustering provided an interpretable dendrogram and stable grouping (Supplementary Figure S3); however, k-means clustering demonstrated comparable or higher silhouette scores, offering more flexible partitioning and iterative refinement (Figures 1, 2). Therefore, k-means clustering (k = 2) was selected for the final classification (Supplementary Figures S4, S5), balancing statistical performance, model simplicity, and clinical interpretability. To evaluate the robustness of the identified clusters, internal validation was performed using bootstrap resampling with 1,000 iterations and Jaccard similarity indices. As a sensitivity analysis, clustering was repeated using Gower distance with partitioning around medoids (PAM) (Supplementary Figure S6). Additionally, internal validation was performed using the Calinski–Harabasz (CH) and Davies–Bouldin (DB) indices across different cluster numbers (k = 2–6) (Supplementary Table S2). The final cluster assignments were appended to the imputed dataset. Group differences between clusters were assessed using chi-squared tests for categorical variables and either Student's t-test or Wilcoxon rank-sum test for continuous variables, depending on distributional assumptions. Scaled variables were compared between the two clusters using both bar plots and a radar chart to illustrate group-level differences (Figure 3). Survival was illustrated using the Kaplan–Meier method, and Cox proportional hazards regression models were applied to assess time-to-event associations between cluster membership and outcomes. Importantly, outcomes were not included as clustering inputs, ensuring independence between phenotype derivation and prognostic evaluation. The proportional hazards assumption was tested using Schoenfeld residuals and was not violated (Supplementary Figure S7). To further assess the reproducibility of the clustering solution, repeated split-sample validation was performed. In each of 100 random replications, the cohort was divided into 70% training and 30% validation subsets. K-means clustering (k = 2) was derived in the training set, and cluster centroids were used to assign patients in the validation set. Agreement between original and validation cluster assignments was quantified by the adjusted Rand index, while prognostic validity was evaluated using log-rank tests and Cox regression (Supplementary Table S3). All statistical tests were two-tailed, and a p-value below 0.05 was considered statistically significant. All statistical analyses were performed using the R 4.4.1 software (R Foundation for Statistical Computing, Vienna, Austria) with packages “missForest”, “dplyr”, “stats”, “cluster”, “clusterCrit”, “fossil”, “naniar”, “dendextend”, “survival”, “survminer”, “rms”, “ggplot2”.
 
  Figure 1. Determination of the optimal number of clusters (k) using the elbow and silhouette methods. Total within-cluster sum of squares (WSS) plotted against increasing values of k. The elbow point was visually identified at k = 2, where the reduction in WSS began to plateau. Average silhouette width across varying k values, with the highest value observed at k = 2, supporting the two-cluster solution.
 
  Figure 2. Principal component analysis (PCA) for k-means and hierarchical clustering (k = 2). Visualization of patient distribution by unsupervised clustering (k-means and hierarchical) using first two principal components. Distinct separation is evident between the two clusters in k-means clustering.
 
  Figure 3. Cluster radar chart and bar plot comparison,radar chart illustrating normalized distributions of selected parameters across the two clusters. Cluster 2 demonstrated greater impairments in hemodynamic, biochemical, and echocardiographic variables compared with Cluster 1. Bar plot comparing scaled mean values for clinical, echocardiographic, laboratory, and hemodynamic parameters between clusters. Cluster 2 was characterized by older age, higher prevalence of comorbidities (diabetes mellitus, atrial fibrillation), worse hemodynamics (higher RAP, LVEDP, and PVR; lower CI), and impaired functional status (lower peak VO2).
Results
Cluster validation
Unsupervised k-means clustering identified two distinct phenotypic clusters among 524 patients with advanced HF. For the primary k-means model based on continuous variables, both clusters showed excellent stability (Jaccard indices: 0.998 and 0.985; Supplementary Figure S6). Sensitivity analysis using Gower distance with PAM clustering yielded consistent results, although with moderately lower stability (Jaccard indices: 0.851 and 0.762). Internal validation using the Calinski–Harabasz (CH) and Davies–Bouldin (DB) indices across different cluster numbers (k = 2–6) consistently supported the two-cluster solution, which yielded the highest CH and the lowest DB values (Supplementary Table S2).
Split-sample validation further confirmed reproducibility. Across 100 replications of 70/30 splits, the adjusted Rand index averaged 0.77 ± 0.06, and prognostic separation was consistently observed (log-rank p < 0.05 in all subsets; pooled HR: 0.83, 95% CI: 0.78–0.89; Supplementary Table S3). Notably, this validation HR reflects reproducibility across resampling iterations, whereas the full-cohort Cox model showed the absolute effect size (HR: 3.84, 95% CI: 2.72–5.43).
Collectively, these analyses indicate that the identified phenotypes are reproducible, robust, and prognostically meaningful rather than artifacts of overfitting.
Cluster 1 comprised 282 patients (53.8%), while Cluster 2 included 242 patients (46.2%). Based on their clinical and physiological profiles, we defined Cluster 1 as the Favorable Profile Cluster (FPC), characterized by more favorable hemodynamic and functional parameters—suggestive of a group appropriate for continued monitoring and optimization of standard therapies. In contrast, Cluster 2 was designated the Adverse Profile Cluster (APC), representing an older cohort with marked hemodynamic compromise and diminished exercise capacity, indicative of a phenotype that may benefit from earlier consideration of advanced interventions or intensified medical management.
Clinical and demographic characteristics
Patients in Cluster 2 were older (median age: 54 (45–60) vs. 52 (43–58) years, p = 0.013) and had a lower body mass index (BMI: 27.0 ± 4.8 vs. 28.2 ± 5.3 kg/m2, p = 0.005) (Table 1). There was no significant difference in sex distribution between the two clusters. The prevalence of ischemic etiology (54.7% vs. 38.2%, p < 0.001), history of percutaneous coronary intervention (45.9% vs. 28.5%, p < 0.001), coronary artery bypass grafting (14.5% vs. 7.8%, p = 0.015), diabetes mellitus (38.4% vs. 28.5%, p = 0.016), atrial fibrillation (28.1% vs. 10.3%, p < 0.001), and implantable cardioverter defibrillator (31.8% vs. 18.1%, p < 0.001) was significantly higher in Cluster 2.
Echocardiographic and hemodynamic findings
Table 2 demonstrates the echocardiographic findings of the patients. Cluster 2 demonstrated more advanced structural and functional cardiac abnormalities. Left ventricular ejection fraction (LVEF) was significantly lower in Cluster 2 (median: 20% (18–24) vs. 23% (20–25), p < 0.001), suggesting more profound systolic dysfunction. Although left ventricular end-diastolic and end-systolic diameters (LVEDD, LVESD) were similar between groups, left atrial size was markedly increased in Cluster 2 (4.84 ± 0.53 cm vs. 4.45 ± 0.61 cm, p < 0.001), reflecting chronic volume overload and diastolic impairment.
Mitral regurgitation severity was significantly greater in Cluster 2, with higher proportions of patients exhibiting moderate-to-severe regurgitation (Grade 2–3 in 82.6% vs. 47.8%, p < 0.001). Similarly, tricuspid regurgitation was more severe in Cluster 2, indicating substantial right-sided valvular involvement and volume burden.
Right ventricular systolic function was also significantly impaired in Cluster 2, with lower tricuspid annular plane systolic excursion (TAPSE: 1.4 ± 0.37 cm vs. 1.8 ± 0.44 cm, p < 0.001) and increased inferior vena cava (IVC) diameter (2.23 ± 0.43 cm vs. 1.64 ± 0.35 cm, p < 0.001), suggesting elevated right atrial pressures and reduced RV contractility. Plethora was observed in over half of Cluster 2 patients (55.8% vs. 2.3%, p < 0.001). Estimated pulmonary artery systolic pressure (PASP) by echocardiography was higher in Cluster 2 (median: 50 mmHg vs. 30 mmHg, p < 0.001), consistent with pulmonary hypertension.
Invasive hemodynamic assessment via right heart catheterization revealed marked elevation in biventricular filling pressures and pulmonary vascular resistance in Cluster 2 (Table 3). Left ventricular end-diastolic pressure (LVEDP), along with pulmonary artery systolic, diastolic, and mean pressures were significantly higher in Cluster 2 than Cluster 1. Cluster 2 also exhibited elevated right atrial pressure (RAP: 12 (9–17) vs. 6 (4–8) mmHg, p < 0.001), right ventricular systolic pressure (RVSP: 59 (48–71) vs. 36 (29–49) mmHg, p < 0.001), and transpulmonary gradient (TPG: 12 (8–19) vs. 6 (3–9) mmHg, p < 0.001) than Cluster 1, indicating more frequent combined pre- and post-capillary pulmonary hypertension in Cluster 2.
Cardiac output (CO) and cardiac index (CI) were significantly reduced in Cluster 2 than Cluster 1, reflecting diminished global perfusion capacity. Pulmonary vascular resistance (PVR) was notably elevated (4.1 (2.56–6.4) vs. 1.4 (0.9–2.33) Wood units, p < 0.001), while systemic vascular resistance (SVR) was also modestly higher (24.2 (20.2–29.4) vs. 21.4 (17.6–29.4) Wood units, p < 0.001). Additionally, stroke volume (SV) and stroke volume index (SVI) were significantly lower in Cluster 2, consistent with advanced circulatory compromise.
Collectively, these findings underscore a more severe biventricular phenotype in Cluster 2, characterized by pronounced systolic dysfunction, elevated filling pressures, secondary valvular disease, and significant pulmonary hypertension.
Laboratory parameters and biomarkers
Patients in Cluster 2 exhibited a laboratory profile consistent with advanced disease severity, multiorgan involvement, and worse nutritional and metabolic status (Table 4). Serum urea levels were higher in Cluster 2, yet serum creatinine levels similar in both groups. Hepatic congestion and dysfunction were more prominent in Cluster 2, with significantly elevated total (1.08 (0.72–1.58) vs. 0.58 (0.41–0.81) mg/dl, p < 0.001) and direct bilirubin levels (0.51 (0.31–0.84) vs. 0.21 (0.15–0.30) mg/dl, p < 0.001), as well as higher GGT (58.5 (30.6–103) vs. 28 (18–44) U/L, p < 0.001) and ALP (100 (71–131) vs. 87 (71.5–105) U/L, p = 0.001).
NT-proBNP levels were nearly threefold higher in Cluster 2 compared to Cluster 1 (3,969 (2,441–6,595) vs. 1,330 (562–2,241) ng/L, p < 0.001), indicating greater myocardial wall stress and hemodynamic overload.
Markers of nutritional status showed significant deterioration in Cluster 2. Serum albumin (41.2 ± 5.79 vs. 44.9 ± 4.32 g/L, p < 0.001), total protein (69.4 ± 8.07 vs. 72.0 ± 6.35 g/L, p < 0.001), and HDL cholesterol levels [34.8 (28.1–43.3) vs. 42 (36.2–50.5) mg/dl, p < 0.001] were significantly lower, suggesting poor nutritional state and reduced hepatic synthetic function.
Hematologic findings were indicative of more pronounced anemia in Cluster 2. Both hemoglobin (13.0 ± 2.09 vs. 14.4 ± 1.65 g/dl, p < 0.001) and hematocrit levels (41.2 ± 6.02% vs. 43.9 ± 4.73%, p < 0.001) were significantly lower compared to Cluster 1, suggesting impaired oxygen-carrying capacity and potential chronic disease-related anemia.
Cardiopulmonary exercise test (CPET) performance
Cluster 2 patients demonstrated significantly reduced functional capacity across multiple CPET parameters, consistent with more advanced heart failure physiology (Table 5). Peak oxygen consumption (peak VO2) was markedly lower in Cluster 2 [10.7 (9–13.2) vs. 16.0 (13.3–18.7) ml/kg/min, p < 0.001], reflecting impaired aerobic capacity and cardiac output reserve. Similarly, the achieved metabolic equivalents (METS) were significantly reduced [3.1 (2.6–3.8) vs. 4.6 (3.8–5.4), p < 0.001], indicating diminished ability to perform physical activity.
Ventilatory efficiency was also substantially worse in Cluster 2, as demonstrated by a significantly elevated VE/VCO2 slope [49.1 (37.6–81.0) vs. 33.5 (29.2–39.8), p < 0.001]. Moreover, lower peak exercise oxygen pulse and VO2/work slope values in this group (both p < 0.001) further support compromised cardiovascular performance and peripheral oxygen extraction.
Outcomes and survival analysis
Over a median follow-up of 2.4 years (interquartile range: 1.4–4.1), the incidence of the composite endpoint was significantly higher in Cluster 2 compared to Cluster 1 (50.0% vs. 15.6%, p < 0.001), highlighting the adverse prognostic profile of this subgroup (Table 6). In Cox regression analysis, assignment to Cluster 2 was associated with a 3.84-fold increased risk of experiencing the composite endpoint (hazard ratio [HR]: 3.84; 95% confidence interval [CI]: 2.72–5.43; p < 0.001) (Table 7, Figure 4).
 
  Table 6. Clinical outcomes (LVAD implantation, heart transplantation, and death) according to phenotypic clusters.
 
  Figure 4. Kaplan–meier survival curves. Kaplan–Meier estimates for the composite outcome (all-cause mortality, LVAD, or transplantation) stratified by cluster assignment. Cluster 2 demonstrated significantly lower event-free survival (log-rank p < 0.0001).
Discussion
In this study, we identified two distinct phenotypic clusters—the FPC and the APC- within an advanced HF population by applying unsupervised ML to a comprehensive and multimodal dataset. These clusters exhibited significant differences in clinical profiles and were associated with long-term outcomes, underscoring their prognostic relevance.
Recent studies by Zhang et al. and Yao et al. have applied supervised ML methods to predict the need for advanced HF therapies (18, 21). Zhang et al. used data from 557 hospitalizations to develop a transparent, rule-based ML model that predicted which patients would require advanced heart failure therapies, such as LVAD or transplantation, during follow-up (18). Yao et al. introduced a novel ML framework that grouped clinical variables into flexible, overlapping categories—allowing a patient to belong partially to more than one risk group, rather than being forced into a single predefined class. This approach, inspired by fuzzy logic, better reflects the clinical continuum and supports the derivation of interpretable reasoning rules. In their pilot application, the method was tested on a real-world cohort of patients evaluated for advanced heart failure therapies, demonstrating its potential to support eligibility classification through a rule-based, interpretable design (21). While these models offer interpretable decision support, they require labeled outcomes and focus on specific treatment decisions. In contrast, our approach aimed to identify clinically meaningful phenotypes associated with prognosis, rather than just treatment eligibility. This methodology offers a broader view of clinical heterogeneity in advanced HF.
Lamp et al. used unsupervised clustering to stratify patients into five risk categories based on a composite outcome of death, LVAD implantation, transplantation, and rehospitalization over six months, and subsequently applied supervised modeling to predict these outcomes (19). Their interpretable model was trained using two distinct input sets: the invasive set, which included variables derived solely from right heart catheterization (e.g., right atrial pressure, pulmonary artery pressures, cardiac output), and the all-feature set, which combined invasive hemodynamic data with a wider range of clinical and laboratory variables. The model achieved high predictive performance, with c-statistics ranging from 0.896 to 0.969 for the invasive set, and 0.858 to 0.997 for the all-feature set, although confidence intervals were not explicitly reported. Despite the impressive discrimination, their analysis was primarily limited to hemodynamic domains. In contrast, our approach integrated a more comprehensive set of variables—including echocardiographic, cardiopulmonary exercise testing (CPET), biochemical, and invasive hemodynamic parameters—allowing for a multidimensional phenotypic characterization with prognostic relevance.
The concept of phenotyping HF patients has emerged from the need for personalized treatment. In heart failure with reduced ejection fraction (HFrEF), applying and titrating guideline-directed therapies can be challenging due to comorbidities and adverse effects on blood pressure, renal function, and electrolyte balance (22). As the HF population ages, comorbidity burden increases, reducing the feasibility of uniform treatment (“one size fit all”) approaches. Similar challenges exist in HFpEF, where pharmacological therapies have generally failed to show mortality benefit in randomized large scale trials (23–25). However, ML-based clustering studies have revealed subgroups with variable treatment responses, supporting the role of precision medicine in this heterogeneous population (26, 27).
ML has gained attraction in HF research for its capacity to manage high-dimensional data and uncover latent phenotypes not captured by traditional statistics (5, 7). Beyond providing therapeutic guidance, it also aids in risk stratification, as demonstrated by a series of analyses that applied supervised machine learning techniques to predict short- and long-term mortality with high accuracy (5, 28, 29).
Patients with advanced heart failure represent the terminal stage of the disease and often present with overlapping symptoms and complex pathophysiology (3). Although machine learning–based clustering has previously been applied in advanced heart failure, prior studies were limited by narrower variable sets, often focusing predominantly on hemodynamics or clinical data. Our study is the first to integrate a comprehensive multimodal dataset—including echocardiographic, cardiopulmonary exercise test, and invasive hemodynamic parameters—allowing a more detailed phenotypic characterization and prognostic stratification in this high-risk population. In our study, the two identified clusters—the FPC and the APC—demonstrated distinct clinical profiles (Figure 5). FPC encompassed individuals with relatively preserved hemodynamic and functional status, whereas APC represented a cohort with marked physiological deterioration and higher disease burden. These contrasting profiles were associated with markedly different long-term outcomes. Patients in the FPC group may benefit from routine follow-up and medical optimization, whereas those in the APC group may require early evaluation for advanced therapies, including inotropic support, mechanical circulatory support, or palliative care planning.
Integrating clustering outputs into clinical workflows could enable timely recognition of high-risk phenotypes and support personalized treatment strategies. As multimorbidity becomes more prevalent in HF populations, incorporating comorbidity profiles into clustering models may enhance patient stratification. Future studies should aim to externally validate these clusters and evaluate their applicability in prospective cohorts.
Limitations
Despite the strengths of our study, several important limitations should be acknowledged. First, the sample size, although relatively large for a single-center advanced HF cohort, remains modest. Second, our cohort was predominantly male, a pattern commonly observed in advanced HF studies; nevertheless, this sex imbalance may restrict the generalizability of our findings to female patients. Third, the retrospective and observational design precludes causal inference. Fourth, although the dataset was comprehensive, it reflects a single-center experience, which may limit external validity and generalizability. While we performed repeated split-sample validation within our cohort to strengthen internal reproducibility, external validation in larger, prospective multicenter cohorts will be essential to confirm the generalizability of our findings. Fifth, binary clinical variables (e.g., sex, diabetes, atrial fibrillation, ischemic etiology) were excluded from the clustering input for methodological reasons, potentially limiting completeness of phenotyping.
Conclusion
This study shows that unsupervised ML-based clustering can reveal clinically important phenotypes in advanced HF using routinely collected multimodal data. The identification of two distinct clusters with differing clinical profiles and outcomes highlights the potential of data-driven approaches to enhance risk stratification and guide personalized care. Prospective validation is warranted to confirm clinical utility.
Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Ethics statement
The studies involving humans were approved by Kartal Kosuyolu Research and Education Hospital Clinical Research Ethics Committee. The studies were conducted in accordance with the local legislation and institutional requirements. The ethics committee/institutional review board waived the requirement of written informed consent for participation from the participants or the participants' legal guardians/next of kin because of the retrospective design of the study and the use of fully anonymized patient data.
Author contributions
MK: Conceptualization, Data curation, Project administration, Writing – original draft. BK: Formal analysis, Methodology, Writing – review & editing. DM: Formal analysis, Supervision, Writing – review & editing. ST: Data curation, Writing – review & editing. AK: Data curation, Investigation, Resources, Writing – review & editing. SE: Conceptualization, Resources, Visualization, Writing – review & editing. CD: Visualization, Writing – review & editing. GH: Conceptualization, Writing – review & editing. ÖA: Conceptualization, Writing – review & editing. KK: Methodology, Supervision, Writing – review & editing. RA: Investigation, Supervision, Writing – review & editing.
Funding
The author(s) declare that no financial support was received for the research and/or publication of this article.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that no Generative AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fcvm.2025.1669538/full#supplementary-material
Supplementary Figure S1 | Flowchart of the study population. Flow diagram showing patient selection. Of 653 patients evaluated at the advanced HF clinic, 129 were excluded (prior LVAD/HTx, preserved LVEF, severe pulmonary disease, contraindications to CPET/RHC, or incomplete follow-up). The final cohort included 524 patients, who were classified into two clusters (Cluster 1, n=282; Cluster 2, n=242).
Supplementary Figure S2 | Missing data map. Visualization of missingness across all variables in the study cohort. Overall, 8.3% of values were missing, while 91.7% were present. The plot highlights variable- and patient-level distribution of missing data.
Supplementary Figure S3 | Determination of the optimal number of clusters for hierarchical clustering. (Left) Elbow method: total within-cluster sum of squares plotted against increasing k values, with an inflection observed at k=2. (Right) Silhouette method: average silhouette width across k, peaking at k=2, supporting the choice of a two-cluster solution.
Supplementary Figure S4 | PCA visualization of k-means clusters. Visualization of the two clusters identified using k-means clustering after dimensionality reduction with PCA.
Supplementary Figure S5 | t-SNE visualization of k-means clusters. t-Distributed Stochastic Neighbor Embedding (t-SNE) plot depicting patient clustering based on high-dimensional input data. Clear distinction observed between Cluster 1 and Cluster 2.
Supplementary Figure S6 | Cluster stability assessed by Jaccard similarity indices. Resampling-based stability analysis for the primary k-means model using continuous variables demonstrated excellent reproducibility of both clusters (Jaccard indices: 0.998 and 0.985). Sensitivity analysis with Gower distance and PAM clustering showed consistent cluster structures with moderately lower stability (Jaccard indices: 0.851 and 0.762), supporting the robustness of the identified phenotypes.
Supplementary Figure S7 | Schoenfeld residuals for proportional hazards assumption. Plots of scaled Schoenfeld residuals over time for covariates included in the Cox regression models. No systematic trends were observed, indicating that the proportional hazards assumption was not violated.
References
1. Dunlay SM, Roger VL, Killian JM, Weston SA, Schulte PJ, Subramaniam AV, et al. Advanced heart failure epidemiology and outcomes: a population-based study. JACC Heart Fail. (2021) 9(10):722–32. doi: 10.1016/j.jchf.2021.05.009
2. McDonagh TA, Metra M, Adamo M, Gardner RS, Baumbach A, Böhm M, et al. 2021 ESC guidelines for the diagnosis and treatment of acute and chronic heart failure. Eur Heart J. (2021) 42(36):3599–726. doi: 10.1093/eurheartj/ehab368
3. Heidenreich PA, Bozkurt B, Aguilar D, Allen LA, Byun JJ, Colvin MM, et al. 2022 AHA/ACC/HFSA guideline for the management of heart failure: a report of the American College of Cardiology/American Heart Association joint committee on clinical practice guidelines. Circulation. (2022) 145(18):e895–1032. doi: 10.1161/CIR.0000000000001063
4. Ahmad T, Pencina MJ, Schulte PJ, O'Brien E, Whellan DJ, Piña IL, et al. Clinical implications of chronic heart failure phenotypes defined by cluster analysis. J Am Coll Cardiol. (2014) 64(17):1765–74. doi: 10.1016/j.jacc.2014.07.979
5. Ahmad T, Lund LH, Rao P, Ghosh R, Warier P, Vaccaro B, et al. Machine learning methods improve prognostication, identify clinically distinct phenotypes, and detect heterogeneity in response to therapy in a large cohort of heart failure patients. J Am Heart Assoc. (2018) 7(8):e008081. doi: 10.1161/JAHA.117.008081
6. Shah SJ, Katz DH, Selvaraj S, Burke MA, Yancy CW, Gheorghiade M, et al. Phenomapping for novel classification of heart failure with preserved ejection fraction. Circulation. (2015) 131(3):269–79. doi: 10.1161/CIRCULATIONAHA.114.010637
7. Meijs C, Handoko ML, Savarese G, Vernooij RWM, Vaartjes I, Banerjee A, et al. Discovering distinct phenotypical clusters in heart failure across the ejection fraction Spectrum: a systematic review. Curr Heart Fail Rep. (2023) 20(5):333–49. doi: 10.1007/s11897-023-00615-z
8. Zhou X, Nakamura K, Sahara N, Asami M, Toyoda Y, Enomoto Y, et al. Exploring and identifying prognostic phenotypes of patients with heart failure guided by explainable machine learning. Life. (2022) 12(6):776. doi: 10.3390/life12060776
9. van de Veerdonk MC, Savarese G, Handoko ML, Beulens JWJ, Asselbergs F, Uijl A. Multimorbidity in heart failure: leveraging cluster analysis to guide tailored treatment strategies. Curr Heart Fail Rep. (2023) 20(5):461–70. doi: 10.1007/s11897-023-00626-w
10. Kaur P, Ha J, Raye N, Ouwerkerk W, van Essen BJ, Tan L, et al. A systematic review of multimorbidity clusters in heart failure: effects of methodologies. Int J Cardiol. (2025) 420:132748. doi: 10.1016/j.ijcard.2024.132748
11. Ahmad FS, Luo Y, Wehbe RM, Thomas JD, Shah SJ. Advances in machine learning approaches to heart failure with preserved ejection fraction. Heart Fail Clin. (2022) 18(2):287–300. doi: 10.1016/j.hfc.2021.12.002
12. Al-Ani MA, Bai C, Hashky A, Parker AM, Vilaro JR, Aranda JM Jr, et al. Artificial intelligence guidance of advanced heart failure therapies: a systematic scoping review. Front Cardiovasc Med. (2023) 10:1127716. doi: 10.3389/fcvm.2023.1127716
13. ElSayed NA, Aleppo G, Aroda VR, Bannuru RR, Brown FM, Bruemmer D, et al. 2. Classification and diagnosis of diabetes: standards of care in diabetes-2023. Diabetes Care. (2023) 46(Suppl 1):S19–40. doi: 10.2337/dc23-S002
14. Van Gelder IC, Rienstra M, Bunting KV, Casado-Arroyo R, Caso V, Crijns HJGM, et al. 2024 ESC guidelines for the management of atrial fibrillation developed in collaboration with the European association for cardio-thoracic surgery (EACTS). Eur Heart J. (2024) 45(36):3314–414. doi: 10.1093/eurheartj/ehae176
15. McEvoy JW, McCarthy CP, Bruno RM, Brouwers S, Canavan MD, Ceconi C, et al. 2024 ESC guidelines for the management of elevated blood pressure and hypertension. Eur Heart J. (2024) 45(38):3912–4018. doi: 10.1093/eurheartj/ehae178
16. Kidney Disease: Improving Global Outcomes (KDIGO) CKD Work Group. KDIGO 2024 clinical practice guideline for the evaluation and management of chronic kidney disease. Kidney Int. (2024) 105(4S):S117–314. doi: 10.1016/j.kint.2023.10.018
17. Mitchell C, Rahko PS, Blauwet LA, Canaday B, Finstuen JA, Foster MC, et al. Guidelines for performing a comprehensive transthoracic echocardiographic examination in adults: recommendations from the American society of echocardiography. J Am Soc Echocardiogr. (2019) 32(1):1–64. doi: 10.1016/j.echo.2018.06.004
18. Zhang Y, Aaronson KD, Gryak J, Wittrup E, Minoccheri C, Golbus JR, et al. Predicting need for heart failure advanced therapies using an interpretable tropical geometry-based fuzzy neural network. PLoS One. (2023) 18(11):e0295016. doi: 10.1371/journal.pone.0295016
19. Lamp J, Wu Y, Lamp S, Afriyie P, Ashur N, Bilchick K, et al. Characterizing advanced heart failure risk and hemodynamic phenotypes using interpretable machine learning. Am Heart J. (2024) 271:1–11. doi: 10.1016/j.ahj.2024.02.001
20. Stekhoven DJ, Bühlmann P. Missforest–non-parametric missing value imputation for mixed-type data. Bioinformatics. (2012) 28(1):112–8. doi: 10.1093/bioinformatics/btr597
21. Yao H, Derksen H, Golbus JR, Zhang J, Aaronson KD, Gryak J, et al. A novel tropical geometry-based interpretable machine learning method: pilot application to delivery of advanced heart failure therapies. IEEE J Biomed Health Inform. (2023) 27(1):239–50. doi: 10.1109/JBHI.2022.3211765
22. Rosano GMC, Moura B, Metra M, Böhm M, Bauersachs J, Ben Gal T, et al. Patient profiling in heart failure for tailoring medical therapy. A consensus document of the heart failure association of the European Society of Cardiology. Eur J Heart Fail. (2021) 23(6):872–81. doi: 10.1002/ejhf.2206
23. Yusuf S, Pfeffer MA, Swedberg K, Granger CB, Held P, McMurray JJ, et al. Effects of candesartan in patients with chronic heart failure and preserved left-ventricular ejection fraction: the CHARM-preserved trial. Lancet. (2003) 362(9386):777–81. doi: 10.1016/S0140-6736(03)14285-7
24. Solomon SD, McMurray JJV, Anand IS, Ge J, Lam CSP, Maggioni AP, et al. Angiotensin-Neprilysin inhibition in heart failure with preserved ejection fraction. N Engl J Med. (2019) 381(17):1609–20. doi: 10.1056/NEJMoa1908655
25. Pitt B, Pfeffer MA, Assmann SF, Boineau R, Anand IS, Claggett B, et al. Spironolactone for heart failure with preserved ejection fraction. N Engl J Med. (2014) 370(15):1383–92. doi: 10.1056/NEJMoa1313731
26. Peters AE, Tromp J, Shah SJ, Lam CSP, Lewis GD, Borlaug BA, et al. Phenomapping in heart failure with preserved ejection fraction: insights, limitations, and future directions. Cardiovasc Res. (2023) 118(18):3403–15. doi: 10.1093/cvr/cvac179
27. Sotomi Y, Hikoso S, Nakatani D, Okada K, Dohi T, Sunaga A, et al. Medications for specific phenotypes of heart failure with preserved ejection fraction classified by a machine learning-based clustering model. Heart. (2023) 109(16):1231–40. doi: 10.1136/heartjnl-2022-322181
28. Kwon JM, Kim KH, Jeon KH, Lee SE, Lee HY, Cho HJ, et al. Artificial intelligence algorithm for predicting mortality of patients with acute heart failure. PLoS One. (2019) 14(7):e0219302. doi: 10.1371/journal.pone.0219302
Keywords: advanced heart failure, phenotyping, unsupervised clustering, machine learning, risk stratification
Citation: Karaçam M, Kültürsay Barkın, Mutlu D, Tanyeri S, Kaya A, Efe Süleyman Çagan, Doğan C, Halil Gülümser Sevgin, Akbal Özgür Yaşar, Kırali K and Acar RD (2025) From patterns to prognosis: machine learning–derived clusters in advanced heart failure. Front. Cardiovasc. Med. 12:1669538. doi: 10.3389/fcvm.2025.1669538
Received: 19 July 2025; Accepted: 7 October 2025;
Published: 23 October 2025.
Edited by:
Ricardo Mourilhe-Rocha, Rio de Janeiro State University, BrazilReviewed by:
Erick Romero, UC Davis Medical Center, United StatesElisa Rauseo, NIHR Barts Cardiovascular Biomedical Research Unit, United Kingdom
Copyright: © 2025 Karaçam, Kültürsay, Mutlu, Tanyeri, Kaya, Efe, Doğan, Halil, Akbal, Kırali and Acar. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Murat Karaçam, bXJ0a3JjbS41QGdtYWlsLmNvbQ==
†ORCID:
Murat Karaçam
orcid.org/0000-0001-7323-8843
Barkın Kültürsay
orcid.org/0000-0002-1424-2209
Deniz Mutlu
orcid.org/0000-0003-4432-4595
Seda Tanyeri
orcid.org/0000-0002-0933-9233
Azmican Kaya
orcid.org/0009-0005-8935-3308
Süleyman Çagan Efe
orcid.org/0000-0002-6067-6841
Cem Doğan
orcid.org/0000-0002-2004-142X
Gülümser Sevgin Halil
orcid.org/0000-0003-0412-5292
Özgür Yaşar Akbal
orcid.org/0000-0002-3882-0288
Kaan Kırali
orcid.org/0000-0003-0044-4691
Rezzan Deniz Acar
orcid.org/0000-0003-1870-4527
 Azmican Kaya4,†
Azmican Kaya4,† 
   
   
   
   
   
  