Abstract
Background:
Cardiovascular events (CVEs) are the leading cause of mortality in hemodialysis patients. Current prediction models rely on clinical and biochemical data, but non-invasive alternatives are needed. Inspired by the Traditional Chinese Medicine (TCM) principle that “the heart opens into the tongue,” this study investigated whether quantitative features from tongue images could enhance CVE prediction.
Objective:
To develop and validate a machine learning framework that integrates tongue image features with conventional clinical variables to predict CVEs in hemodialysis patients.
Methods:
In this prospective, multicenter study, 506 maintenance hemodialysis patients were recruited. We extracted 1,354 hand-crafted radiomic features and 8 deep-learning features from standardized tongue images. These were combined with 90 clinical variables. Using a dataset split into training (n=243), validation (n=105), and an independent external test set (n=158), we developed and compared four models (LR, LightGBM, AdaBoost, MLP) under three feature configurations: clinical-only, tongue-only, and a fused model.
Results:
The model using only tongue image features (AdaBoost) significantly outperformed the clinical-only model, achieving an AUC of 0.786 vs. 0.682 on the external test set. The fused model provided a marginal improvement (AUC=0.787). SHAP analysis indicated that both tongue texture features and clinical biomarkers like PT% were key predictors. Decision curve analysis confirmed the clinical utility of the tongue-based and fused models across a range of risk thresholds.
Conclusion:
Tongue image features are potent, non-invasive predictors of CVEs in hemodialysis patients, offering performance superior to conventional clinical variables. This AI-driven approach validates the TCM theory and presents a promising supplementary tool for enhancing risk stratification in nephrology care.
1 Introduction
Hemodialysis patients are confronted with a significantly heightened burden of cardiovascular disease, with mortality rates reaching up to 20-fold higher than those of the general population (Luo et al., 2025). The reported prevalence of cardiovascular events (CVEs) exhibits variation: one study indicated that 30.6% of hemodialysis patients experienced CVEs (Yoshida et al., 2025), while a retrospective analysis reported an incidence of 11.59% among end - stage renal disease patients undergoing hemodialysis (Fei et al., 2025). Considering that CVEs are a primary cause of death in this patient population, the development of accurate predictive models is of utmost importance for the early identification of high-risk individuals and timely interventions. This can potentially reduce severe complications such as myocardial infarction and heart failure, improve survival rates, and alleviate the healthcare burden (Gan et al., 2024).
Predictive models for hemodialysis patients frequently utilize clinical, demographic, and laboratory data. Ensemble methods (e.g., XGBoost, Random Forest) have demonstrated robust performance (AUCs ranging from 0.76 to 0.89) (Sheng et al., 2020; Li et al., 2023; Wang et al., 2024; Matsubara et al., 2017; Mei et al., 2020; Qin et al., 2024; Zhang et al., 2022). Nevertheless, these models rely on invasive or conventional data sources. Recent research has explored non - invasive alternatives, including medical imaging and facial photographs (Bär et al., 2024; Knorr et al., 2022). In contrast, integrating tongue image features with machine learning presents a non - invasive and convenient approach rooted in TCM. The TCM principle “the heart opens into the tongue” implies that the tongue’s morphology may reflect cardiovascular health (Duan et al., 2024).
Advances in artificial intelligence, especially deep learning, have facilitated the extraction of quantitative features from tongue images. Emerging evidence supports the potential of these features for predicting cardiac diseases (Duan et al., 2024; Yunhu et al., 2023). However, no study has yet incorporated tongue imaging into the prediction of cardiovascular events for hemodialysis patients, presumably due to practical challenges: (a) the limited availability of specialized tongue imaging equipment in routine clinical settings; (b) the necessity for TCM - guided interpretation, which requires an integrative framework bridging Eastern and Western medicine; and (c) the inherently interdisciplinary nature of the work, which demands collaboration across clinical medicine, TCM, and AI.
This study aims to employ machine learning to automatically extract key features from tongue images and develop a predictive model for cardiovascular events in hemodialysis patients. Furthermore, we propose integrating these tongue derived features with conventional clinical and laboratory parameters and validating the improved predictive performance in a prospective clinical study. This integrative approach aims to enhance the precision and personalization of cardiovascular risk stratification and intervention. The study flowchart is presented in Figure 1.
FIGURE 1
2 Materials and methods
2.1 Study population
This prospective, multicenter study recruited 506 maintenance hemodialysis patients from January 2024 to September 2025 across three branches of the Guangdong Provincial Hospital of Chinese Medicine, namely the University Town Hospital (n = 158), the Dadelu General Hospital (n = 240), and the Fangcun Hospital (n = 108). The study cohort consisted of 120 patients who had experienced cardiovascular events (CVE group) and 386 patients who had not (non-CVE group). The data from the Fangcun and Dadelu hospitals were randomly partitioned at a ratio of 7:3 into a training set (n = 243) and a validation set (n = 105). The data from the University Town Hospital (n = 158) were utilized as an independent external test set to assess the model’s generalizability.
Standardized tongue images were obtained using a specialized imaging device (Model DS01 B) during hemodialysis sessions, based on the traditional Chinese medicine tenet that “the heart opens to the tongue” (Huang and Yuan, 2020), which postulates that tongue morphology reflects cardiovascular health. Image acquisition was carried out under uniform lighting conditions to guarantee comparability among participants. In this research, tongue features are regarded as a non - invasive biomarker capable of detecting microcirculatory changes, such as those related to inflammation or fibrosis, which are associated with an increased cardiovascular risk in hemodialysis patients (Duan et al., 2024).
Simultaneously, 90 clinical and physicochemical variables were gathered, with laboratory values corresponding to the most recent test results available at the time of imaging. The study was approved by the hospital’s Ethics Committee (No. YE2024 022 01), and all participants provided written informed consent.
Inclusion criteria were as follows: (a) regular hemodialysis for ≥3 months; (b) age ≥18 years, conscious, and without major physical disabilities; (c) stable clinical condition with well - controlled comorbidities; and (d) ability to adhere to study procedures and provide reliable data. Exclusion criteria were: (a) active infection, aldosteronism, adrenal lesions, severe liver/brain/hematopoietic disorders, psychiatric conditions, or poor general health; and (b) history of kidney transplantation.
Two patients at the University Town Hospital were unable to cooperate with tongue image acquisition and were thus excluded from the study, and one patient at the Fangcun Hospital declined participation due to privacy concerns. No participants were lost to follow-up or withdrew from the study.
2.2 Definition and adjudication of cardiovascular events
In this study, CVEs were defined as the initial occurrence of any of the following during the follow-up period, in accordance with the 2020 American Heart Association guidelines and international consensus (Merchant et al., 2020): (a) Myocardial infarction (MI) (National Expert Committee on Rational Drug Use NHaFPC, 2019), diagnosed by ischemic manifestations, dynamic electrocardiogram (ECG) alterations (e.g., ST - segment elevation or new left bundle - branch block), and the rise or fall of high-sensitivity troponin T (hs-TnT) with at least one value exceeding the 99th percentile; (b) Cardiovascular (Cardiovascular Disease Group SoP and Chinese Medical Association, 2022) mortality, ascribed to acute coronary syndrome, heart failure, documented malignant arrhythmia, or sudden cardiac death, without any non-cardiac cause, based on medical records, death certificates, or autopsy findings; (c) Hospitalization for heart failure (Juan et al., 2024), meeting the European Society of Cardiology (ESC) criteria and necessitating an admission of ≥24 h; (d) Hospitalized unstable angina, diagnosed in the absence of elevated troponin but with ischemic symptoms and objective evidence of ischemia; and (e) Clinically significant arrhythmias (Zhao et al., 2020)—sustained ventricular tachycardia, hemodynamically compromising atrial fibrillation (e.g., systolic blood pressure <90 mmHg or requiring urgent cardioversion), or high-grade atrioventricular (AV) block (Mobitz II or third-degree) requiring intervention. Cerebrovascular events such as stroke were excluded from the primary endpoint. All suspected CVEs were independently adjudicated by two blinded cardiologists using clinical data, ECGs, serial hs - TnT measurements, and imaging modalities (e.g., echocardiography, angiography), in line with the current ESC/American College of Cardiology (ACC) criteria; discrepancies were resolved through consensus by a third senior cardiologist.
2.3 Data preprocessing
All preprocessing operations were carried out subsequent to dataset splitting to preclude data leakage. Preprocessing, encompassing missing value imputation, outlier removal, and normalization, utilized statistics exclusively derived from the training set. The parameters were uniformly applied to the validation and test sets. Among the 90 physicochemical variables, those with a missing value proportion exceeding 30% were excluded (Sterne et al., 2009) (The threshold selection is based on Reference (Sterne et al., 2009) to guarantee data integrity). The remaining missing values were imputed via Multiple Imputation by Chained Equations (MICE) in R (mice package v4.4.0), resulting in the generation of five imputed datasets. The final values were obtained by averaging across the imputations. Outliers deviating by more than ±3 standard deviations from the training set mean were removed.
Tongue images were visually examined by trained professionals. Images of poor quality (blurry, over/under - exposed, contaminated) were excluded. High - quality images were manually annotated using LabelMe (v5.2.1) to demarcate the tongue region. The JSON annotations were transformed into binary masks (using labelme_json_to_dataset) to segment the tongue and mask non - tongue areas (lips, skin, background), thereby minimizing visual noise during the training process.
Finally, all numerical features, including those derived from physicochemical data and tongue images, were standardized through Z - score normalization based on the training set means (μ) and standard deviations (σ), as shown in Equation 1. The same parameters were applied to the validation and test sets to ensure consistent evaluation and reproducibility.
2.4 Algorithm selection
We conducted an evaluation of four algorithms: Logistic Regression (LR) with L2 regularization, LightGBM featuring moderate depth and strong L1/L2 regularization, AdaBoost employing shallow decision stumps, and a Multilayer Perceptron (MLP) equipped with a single hidden layer. Hyperparameters were adjusted in accordance with preliminary experiments to mitigate overfitting. All models were trained under three configurations: using only clinical variables, utilizing only tongue features, and incorporating fused features.
2.5 Implementation details
This research employed both hand - crafted and deep - learning techniques to extract multi - dimensional features from segmented tongue images. Hand - crafted features (n = 1,354) were calculated using OpenCV (v4.12.0) and PyRadiomics (v3.1.0), which included: (a) color descriptors (mean and standard deviation across RGB, CIELAB, HSV; 18 features); (b) texture metrics (GLCM, LBP, Gabor filters at 4 scales and 6 orientations); (c) shape descriptors (area, perimeter, eccentricity, and morphological measures from edge contours); and (d) Shannon entropy for textural complexity.
Regarding deep features, a ResNet - 50 model pre - trained on ImageNet was fine - tuned. Stochastic Gradient Descent (SGD) was utilized with a learning rate of 0.01, a momentum of 0.9, and a batch size of 32 over 30 epochs. Data augmentation (±15° rotation, horizontal flip, ±10% scaling) and cross - entropy loss were applied on an NVIDIA RTX 4090 GPU. The 2,048 - dimensional output from the penultimate layer was reduced to 8 dimensions through Principal Component Analysis (PCA), retaining components that accounted for ≥95% of the cumulative variance.
The final input consisted of 90 clinical/laboratory variables, 1,354 hand - crafted features, and 8 deep features (totaling 1,460 dimensions). To mitigate redundancy, LASSO regression with L1 regularization was employed. The optimal λ was selected via 10 - fold cross - validation using the one - standard - error rule. Only features with non - zero coefficients were retained. All features were Z - score normalized based on the training - set statistics prior to fusion. The entire pipeline was implemented in Python 3.9, leveraging opencv - python, PyRadiomics, scikit - learn, and PyTorch (v1.12.1).
2.6 Statistical analysis
All statistical analyses were carried out in Python 3.9, leveraging scipy (v1.13.1), statsmodels (v0.13.2), and scikit - learn (v1.0.2), with a two - sided significance level set at α = 0.05. Baseline characteristics were compared between patients with and without cardiovascular events (CVEs). Normality was evaluated through the Shapiro–Wilk test and Q–Q plots. For normally distributed continuous variables (characterized by homogeneous variance), they were summarized as the mean ± standard deviation (SD) and compared using the independent - samples t - test. In contrast, non - normal variables were reported as the median (inter - quartile range, IQR) and analyzed via the Mann–Whitney U test. Categorical variables were presented as counts (percentages) and compared using the chi - squared test; when the expected cell counts were <less than 5, Fisher’s exact test was employed.
To augment the robustness and validation of the model, sensitivity analyses were conducted. Specifically, 10 - fold cross - validation was implemented during LASSO hyperparameter tuning to alleviate overfitting. Uncertainty was quantified by calculating 95% confidence intervals for the area AUC using DeLong’s method, and further validation was achieved through 1,000 bootstrap resampling iterations.
The performance of the model on the independent test set was evaluated using multiple metrics, including the AUC, accuracy, sensitivity, specificity, positive and negative predictive values (PPV/NPV), and F1 - score. Calibration was assessed visually through calibration plots and statistically via the Hosmer–Lemeshow test (where P > 0.05 indicates an adequate fit). Clinical utility was determined through decision curve analysis (DCA), which measured the net benefit across risk thresholds ranging from 5% to 30%. A model was considered clinically useful if it outperformed both the “treat all” and “treat none” strategies.
3 Results
3.1 Training, validation, and test cohorts
This study encompassed 506 patients, with 348 patients from Fangcun and Dade Road General Hospitals constituting the model development cohort, and 158 patients from University Town Hospital serving as a completely independent external test cohort, which was not utilized at all during model development, hyper - parameter tuning, or data splitting.
Within the development cohort, stratified random splitting based on cardiovascular event status was carried out 10 times at a 7:3 ratio to maintain event prevalence and evaluate performance variability. For each iteration, models were trained on the training subset and evaluated on the validation subset, with the AUC serving as the primary evaluation metric. Although selecting the split with the highest performance may introduce a slight optimistic bias, this approach ensured consistency across analyses. The final conclusions were validated on the reserved external test set to minimize the risk of overfitting.
The split with the highest validation AUC was chosen as the fixed partition for all subsequent steps. It consisted of a training set of 243 patients (57 events, 23.5%) and a validation set of 105 patients (20 events, 19.0%). All subsequent procedures, including feature selection, hyper - parameter tuning, model comparison, and performance reporting, were strictly based on this fixed split to ensure reproducibility.
The external test cohort included 158 patients, 43 of whom experienced cardiovascular events (CVEs, (27.2%). The baseline characteristics were generally comparable to those of the development cohort (Table 1), indicating its suitability for unbiased external validation.
TABLE 1
| Characteristic | Train cohort (n = 243) | Validation cohort (n = 105) | Test cohort (n = 158) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Non_CVEs | CVEs | P value | Non_CVEs | CVEs | P value | Non_CVEs | CVEs | P value | |
| Age | 58.194 ± 12.043 | 63.544 ± 12.265 | 0.014 | 57.035 ± 11.865 | 64.250 ± 12.234 | 0.014 | 55.409 ± 14.844 | 58.116 ± 13.672 | 0.358 |
| AST | 14.484 ± 8.879 | 14.544 ± 7.797 | 0.914 | 16.718 ± 12.709 | 12.750 ± 4.387 | 0.316 | 12.809 ± 9.356 | 11.558 ± 4.807 | 0.387 |
| Ferritin | 467.405 ± 531.074 | 286.761 ± 329.578 | 0.032 | 369.434 ± 459.053 | 207.100 ± 173.725 | 0.392 | 217.781 ± 255.778 | 163.598 ± 138.711 | 0.519 |
| LDL_C | 2.161 ± 0.876 | 1.989 ± 0.733 | 0.219 | 2.125 ± 0.727 | 2.592 ± 1.357 | 0.407 | 2.093 ± 0.871 | 1.850 ± 0.756 | 0.112 |
| Cr | 852.989 ± 290.288 | 744.860 ± 266.192 | 0.017 | 856.859 ± 276.065 | 727.150 ± 271.729 | 0.061 | 852.426 ± 300.809 | 885.000 ± 251.340 | 0.528 |
| GLU | 8.056 ± 3.908 | 8.936 ± 6.597 | 0.718 | 7.798 ± 3.181 | 8.309 ± 3.363 | 0.562 | 7.550 ± 3.679 | 8.927 ± 3.799 | 0.003 |
| HCT | 31.481 ± 6.078 | 31.279 ± 7.121 | 0.525 | 30.832 ± 7.340 | 29.110 ± 6.529 | 0.206 | 29.562 ± 6.560 | 31.070 ± 7.001 | 0.268 |
| Ca | 2.176 ± 0.199 | 2.167 ± 0.192 | 0.747 | 2.212 ± 0.229 | 2.146 ± 0.174 | 0.231 | 2.141 ± 0.213 | 2.133 ± 0.168 | 0.823 |
| ALT | 12.516 ± 11.048 | 12.386 ± 9.601 | 0.768 | 14.824 ± 12.435 | 9.850 ± 3.801 | 0.288 | 12.061 ± 13.287 | 10.023 ± 5.294 | 0.442 |
| Hb | 101.054 ± 18.398 | 100.386 ± 23.755 | 0.824 | 99.235 ± 22.763 | 92.400 ± 21.330 | 0.229 | 95.165 ± 21.446 | 99.581 ± 22.398 | 0.388 |
| Fe | 9.646 ± 5.192 | 9.724 ± 4.365 | 0.514 | 9.894 ± 5.285 | 8.094 ± 3.569 | 0.203 | 9.839 ± 3.896 | 10.668 ± 5.493 | 0.599 |
| ALB | 36.947 ± 4.962 | 37.782 ± 4.005 | 0.378 | 36.921 ± 4.304 | 36.995 ± 5.585 | 0.958 | 37.810 ± 4.680 | 39.109 ± 4.260 | 0.117 |
| ALP | 91.602 ± 69.891 | 90.421 ± 48.915 | 0.852 | 89.671 ± 53.784 | 83.850 ± 33.862 | 0.925 | 90.678 ± 62.346 | 90.233 ± 47.877 | 0.452 |
| β2_MG | 39.337 ± 12.052 | 42.704 ± 13.650 | 0.107 | 41.953 ± 13.930 | 44.891 ± 11.929 | 0.298 | 37.309 ± 14.674 | 42.481 ± 16.216 | 0.061 |
| K | 4.721 ± 0.845 | 4.698 ± 0.900 | 0.777 | 4.701 ± 0.752 | 4.631 ± 0.823 | 0.713 | 4.755 ± 0.743 | 4.749 ± 0.717 | 0.964 |
| 25_OH_D | 53.432 ± 21.817 | 50.814 ± 21.554 | 0.448 | 53.311 ± 22.703 | 47.485 ± 20.579 | 0.219 | 77.220 ± 31.384 | 75.342 ± 29.967 | 0.936 |
| AG | 17.323 ± 5.408 | 17.333 ± 4.762 | 0.933 | 16.893 ± 4.767 | 17.330 ± 4.043 | 0.706 | 17.729 ± 3.989 | 18.953 ± 3.923 | 0.086 |
| ALB/GLB | 1.303 ± 0.288 | 1.319 ± 0.257 | 0.997 | 1.258 ± 0.279 | 1.260 ± 0.254 | 0.973 | 1.483 ± 0.351 | 1.542 ± 0.316 | 0.341 |
| APTT | 34.118 ± 8.771 | 35.614 ± 9.431 | 0.444 | 34.632 ± 8.712 | 34.690 ± 9.843 | 0.899 | 35.243 ± 9.950 | 37.635 ± 9.675 | 0.150 |
| AST/ALT | 1.312 ± 0.526 | 1.272 ± 0.450 | 0.751 | 1.300 ± 0.633 | 1.435 ± 0.574 | 0.247 | 1.268 ± 0.569 | 1.249 ± 0.459 | 0.682 |
| BASO | 0.037 ± 0.026 | 0.038 ± 0.022 | 0.436 | 0.042 ± 0.026 | 0.036 ± 0.023 | 0.391 | 0.036 ± 0.024 | 0.044 ± 0.037 | 0.358 |
| BASO% | 0.610 ± 0.387 | 0.665 ± 0.375 | 0.166 | 0.689 ± 0.494 | 0.600 ± 0.303 | 0.935 | 0.622 ± 0.512 | 0.672 ± 0.416 | 0.331 |
| BNP | 906.290 ± 1414.307 | 1672.230 ± 1866.926 | 0.005 | 1465.049 ± 1746.589 | 1569.825 ± 1787.118 | 0.771 | 1085.488 ± 1650.581 | 1514.847 ± 1696.087 | 0.018 |
| CK | 118.323 ± 137.508 | 132.544 ± 160.949 | 0.240 | 166.035 ± 219.761 | 149.100 ± 186.763 | 0.380 | 171.539 ± 207.477 | 185.395 ± 274.689 | 0.874 |
| CK_MB | 16.190 ± 7.683 | 16.561 ± 7.292 | 0.457 | 16.689 ± 7.634 | 17.790 ± 7.774 | 0.485 | 20.501 ± 10.177 | 17.914 ± 7.348 | 0.324 |
| CL | 99.366 ± 3.907 | 99.937 ± 4.670 | 0.358 | 98.594 ± 3.850 | 100.075 ± 2.998 | 0.111 | 100.403 ± 3.444 | 99.356 ± 3.289 | 0.087 |
| CRP | 7.132 ± 9.674 | 7.410 ± 9.655 | 0.814 | 7.115 ± 12.885 | 7.131 ± 7.077 | 0.110 | 6.587 ± 9.942 | 7.245 ± 9.759 | 0.296 |
| eGFR | 5.527 ± 3.020 | 6.540 ± 3.187 | 0.006 | 6.580 ± 10.128 | 6.615 ± 3.982 | 0.585 | 6.068 ± 5.122 | 5.146 ± 2.095 | 0.148 |
| EOSIN | 0.349 ± 0.448 | 0.330 ± 0.260 | 0.699 | 0.354 ± 0.295 | 0.231 ± 0.196 | 0.018 | 0.277 ± 0.220 | 0.384 ± 0.384 | 0.168 |
| EOSIN% | 5.361 ± 5.036 | 5.532 ± 3.990 | 0.552 | 5.659 ± 4.019 | 3.835 ± 2.659 | 0.044 | 4.795 ± 3.795 | 5.584 ± 4.150 | 0.230 |
| FDP | 2.806 ± 1.891 | 2.574 ± 1.201 | 0.688 | 3.032 ± 2.401 | 2.364 ± 0.943 | 0.241 | 6.748 ± 5.028 | 7.714 ± 5.471 | 0.274 |
| FIB | 3.722 ± 1.414 | 3.997 ± 1.515 | 0.360 | 3.972 ± 1.407 | 3.681 ± 1.538 | 0.238 | 4.105 ± 1.523 | 4.199 ± 1.462 | 0.863 |
| GGT | 30.199 ± 31.675 | 32.140 ± 24.497 | 0.277 | 33.106 ± 39.769 | 25.300 ± 14.769 | 0.845 | 22.026 ± 14.271 | 30.000 ± 40.900 | 0.385 |
| GLB | 29.194 ± 4.698 | 29.521 ± 4.341 | 0.640 | 29.924 ± 5.048 | 29.760 ± 4.201 | 0.993 | 26.462 ± 5.028 | 26.130 ± 4.754 | 0.882 |
| HbA1c | 5.909 ± 1.967 | 5.539 ± 1.355 | 0.870 | 6.172 ± 2.107 | 6.895 ± 2.322 | 0.027 | 6.143 ± 2.357 | 6.974 ± 2.803 | 0.213 |
| HDL_C | 1.086 ± 0.394 | 1.111 ± 0.351 | 0.573 | 1.116 ± 0.416 | 1.135 ± 0.407 | 0.613 | 0.946 ± 0.274 | 0.969 ± 0.315 | 0.983 |
| hsCRP | 10.017 ± 27.919 | 9.350 ± 17.464 | 0.646 | 8.470 ± 14.549 | 10.970 ± 23.325 | 0.568 | 7.125 ± 11.184 | 5.859 ± 5.413 | 0.676 |
| INR | 1.119 ± 0.237 | 1.101 ± 0.196 | 0.803 | 1.100 ± 0.220 | 1.082 ± 0.215 | 0.780 | 1.155 ± 0.263 | 1.140 ± 0.220 | 0.956 |
| LDH | 208.398 ± 62.560 | 211.526 ± 55.247 | 0.567 | 211.035 ± 66.096 | 182.150 ± 66.148 | 0.041 | 266.017 ± 92.647 | 241.791 ± 90.416 | 0.083 |
| LYM | 1.124 ± 0.412 | 1.136 ± 0.403 | 0.969 | 1.145 ± 0.393 | 1.080 ± 0.378 | 0.549 | 1.143 ± 0.448 | 1.177 ± 0.388 | 0.517 |
| LYM% | 18.728 ± 7.208 | 19.404 ± 6.900 | 0.595 | 18.859 ± 6.862 | 18.265 ± 6.549 | 0.654 | 19.563 ± 7.633 | 19.474 ± 5.612 | 0.774 |
| MCH | 29.061 ± 3.383 | 29.539 ± 2.814 | 0.605 | 28.804 ± 3.370 | 28.510 ± 3.203 | 0.967 | 28.773 ± 3.332 | 28.684 ± 3.129 | 0.418 |
| MCHC | 319.075 ± 10.556 | 320.544 ± 11.370 | 0.368 | 318.282 ± 12.742 | 317.050 ± 8.507 | 0.519 | 321.661 ± 12.801 | 320.814 ± 11.736 | 0.706 |
| MCV | 90.913 ± 9.429 | 92.123 ± 7.975 | 0.764 | 90.176 ± 8.943 | 89.850 ± 9.031 | 0.599 | 89.326 ± 8.697 | 89.330 ± 8.507 | 0.636 |
| Mg | 0.940 ± 0.161 | 0.932 ± 0.181 | 0.583 | 0.937 ± 0.184 | 0.917 ± 0.160 | 0.645 | 1.003 ± 0.134 | 1.006 ± 0.154 | 0.914 |
| MONO | 0.462 ± 0.219 | 0.435 ± 0.172 | 0.532 | 0.458 ± 0.168 | 0.487 ± 0.212 | 0.519 | 0.461 ± 0.222 | 0.443 ± 0.203 | 0.641 |
| MONO% | 7.362 ± 2.437 | 7.177 ± 2.143 | 0.542 | 7.274 ± 1.905 | 7.795 ± 2.390 | 0.582 | 7.630 ± 2.810 | 7.007 ± 2.010 | 0.312 |
| MPV | 9.184 ± 1.071 | 9.289 ± 0.968 | 0.506 | 9.265 ± 1.092 | 9.355 ± 1.342 | 0.751 | 9.494 ± 1.034 | 9.537 ± 1.121 | 0.819 |
| Na | 138.041 ± 3.047 | 138.000 ± 3.224 | 0.612 | 137.424 ± 3.503 | 137.900 ± 2.150 | 0.711 | 138.878 ± 2.596 | 138.163 ± 2.581 | 0.145 |
| NEUT | 4.356 ± 1.900 | 4.160 ± 1.485 | 0.806 | 4.427 ± 1.750 | 4.422 ± 1.784 | 0.948 | 4.321 ± 2.141 | 4.209 ± 1.537 | 0.930 |
| NEUT% | 67.849 ± 8.865 | 67.239 ± 9.582 | 0.942 | 67.373 ± 8.965 | 69.475 ± 6.976 | 0.329 | 67.398 ± 10.117 | 67.267 ± 7.149 | 0.493 |
| nonHDL_C | 2.694 ± 0.995 | 2.439 ± 0.800 | 0.125 | 2.621 ± 0.863 | 3.219 ± 1.517 | 0.222 | 2.744 ± 1.065 | 2.531 ± 0.998 | 0.206 |
| P | 1.875 ± 0.568 | 1.807 ± 0.628 | 0.251 | 1.884 ± 0.617 | 1.862 ± 0.781 | 0.546 | 1.761 ± 0.640 | 1.773 ± 0.632 | 0.910 |
| PA | 306.086 ± 83.597 | 290.632 ± 79.005 | 0.248 | 293.118 ± 80.854 | 307.600 ± 70.982 | 0.463 | 315.765 ± 81.712 | 322.233 ± 56.505 | 0.542 |
| PCT | 0.189 ± 0.059 | 0.183 ± 0.057 | 0.608 | 0.186 ± 0.066 | 0.208 ± 0.053 | 0.106 | 0.200 ± 0.083 | 0.204 ± 0.062 | 0.402 |
| PDW | 14.616 ± 2.410 | 14.232 ± 2.712 | 0.869 | 14.636 ± 2.280 | 14.705 ± 1.983 | 0.457 | 13.763 ± 2.729 | 14.812 ± 2.441 | 0.023 |
| PLT | 205.263 ± 69.753 | 198.053 ± 63.589 | 0.519 | 200.376 ± 70.181 | 227.400 ± 55.969 | 0.112 | 209.600 ± 80.178 | 220.465 ± 79.097 | 0.470 |
| PT | 13.949 ± 2.338 | 14.126 ± 2.118 | 0.620 | 14.205 ± 2.671 | 13.000 ± 2.362 | 0.061 | 13.677 ± 3.290 | 13.123 ± 3.372 | 0.186 |
| PTH | 443.239 ± 455.022 | 385.677 ± 343.393 | 0.959 | 391.642 ± 388.148 | 355.690 ± 316.270 | 0.880 | 434.369 ± 448.658 | 420.486 ± 323.302 | 0.361 |
| PT% | 80.238 ± 37.954 | 53.084 ± 34.677 | <0.001 | 78.392 ± 38.240 | 60.935 ± 32.409 | 0.031 | 76.763 ± 37.924 | 58.263 ± 37.976 | 0.006 |
| RBC | 3.502 ± 0.707 | 3.402 ± 0.739 | 0.459 | 3.433 ± 0.713 | 3.259 ± 0.717 | 0.327 | 3.325 ± 0.736 | 3.488 ± 0.776 | 0.223 |
| RDW | 14.832 ± 1.889 | 14.495 ± 1.381 | 0.667 | 15.101 ± 2.110 | 15.430 ± 2.530 | 0.579 | 14.792 ± 1.918 | 14.779 ± 1.646 | 0.722 |
| RET | 64.372 ± 37.288 | 58.295 ± 35.615 | 0.100 | 61.091 ± 31.782 | 62.550 ± 37.462 | 0.659 | 25.163 ± 15.640 | 32.567 ± 27.520 | 0.253 |
| RET% | 1.797 ± 0.946 | 1.804 ± 1.124 | 0.527 | 1.826 ± 1.079 | 2.031 ± 1.537 | 0.899 | 0.794 ± 0.577 | 1.010 ± 0.817 | 0.086 |
| TBA | 6.216 ± 9.557 | 7.033 ± 9.757 | 0.237 | 7.142 ± 9.535 | 4.440 ± 3.023 | 0.141 | 4.890 ± 5.130 | 7.440 ± 14.874 | 0.571 |
| TBIL | 5.985 ± 2.237 | 5.907 ± 2.618 | 0.605 | 5.758 ± 2.121 | 5.440 ± 1.380 | 0.915 | 14.149 ± 19.765 | 6.637 ± 3.554 | 0.083 |
| TC | 3.780 ± 1.059 | 3.550 ± 0.858 | 0.187 | 3.743 ± 0.877 | 4.354 ± 1.609 | 0.298 | 3.691 ± 1.138 | 3.500 ± 0.982 | 0.422 |
| TCO2 | 20.697 ± 3.540 | 21.212 ± 2.952 | 0.229 | 21.239 ± 3.304 | 20.555 ± 2.269 | 0.383 | 20.462 ± 3.012 | 19.935 ± 2.529 | 0.309 |
| TG | 1.616 ± 1.120 | 1.412 ± 0.998 | 0.038 | 1.837 ± 1.570 | 1.630 ± 0.996 | 0.732 | 1.727 ± 0.997 | 1.883 ± 1.560 | 0.958 |
| TIBC | 43.618 ± 11.025 | 45.674 ± 9.082 | 0.122 | 42.846 ± 12.030 | 45.255 ± 7.757 | 0.159 | 44.778 ± 9.652 | 45.400 ± 10.364 | 0.839 |
| TnT | 0.097 ± 0.094 | 0.112 ± 0.095 | 0.082 | 0.124 ± 0.129 | 0.113 ± 0.083 | 0.794 | 0.112 ± 0.114 | 0.081 ± 0.066 | 0.509 |
| TP | 66.232 ± 6.050 | 67.304 ± 5.375 | 0.211 | 66.448 ± 6.025 | 66.755 ± 6.755 | 0.842 | 64.271 ± 5.766 | 65.240 ± 5.995 | 0.270 |
| TP_Ab | 0.113 ± 0.128 | 0.131 ± 0.179 | 0.594 | 2.254 ± 19.904 | 0.132 ± 0.205 | 0.686 | 97.521 ± 91.889 | 68.430 ± 89.707 | 0.030 |
| TSAT | 23.510 ± 14.084 | 22.354 ± 11.224 | 0.840 | 24.059 ± 12.283 | 18.490 ± 9.576 | 0.060 | 23.071 ± 10.440 | 24.967 ± 14.620 | 0.693 |
| TSH | 13.510 ± 25.784 | 7.605 ± 19.938 | 0.034 | 15.793 ± 29.264 | 15.341 ± 29.119 | 0.229 | 34.000 ± 38.110 | 33.671 ± 38.120 | 0.785 |
| TT | 17.572 ± 2.726 | 18.149 ± 2.763 | 0.151 | 17.391 ± 2.709 | 18.430 ± 2.600 | 0.111 | 18.626 ± 2.786 | 18.784 ± 2.676 | 0.877 |
| UA | 435.366 ± 104.657 | 418.368 ± 112.228 | 0.293 | 414.106 ± 116.774 | 392.450 ± 105.198 | 0.380 | 394.583 ± 123.220 | 401.605 ± 85.714 | 0.555 |
| UIBC | 31.797 ± 13.323 | 33.532 ± 11.678 | 0.309 | 30.569 ± 14.084 | 34.855 ± 9.787 | 0.057 | 32.685 ± 11.930 | 33.579 ± 13.015 | 0.911 |
| Urea | 23.449 ± 7.205 | 21.812 ± 7.226 | 0.135 | 23.849 ± 6.655 | 23.172 ± 7.183 | 0.688 | 21.231 ± 6.679 | 21.971 ± 6.686 | 0.537 |
| VB12 | 1305.656 ± 402.160 | 1295.368 ± 420.960 | 0.985 | 1304.553 ± 393.484 | 1190.750 ± 507.015 | 0.352 | 812.443 ± 559.750 | 1042.140 ± 551.268 | 0.036 |
| WBC | 6.331 ± 2.259 | 6.099 ± 1.740 | 0.805 | 6.426 ± 2.002 | 6.259 ± 2.153 | 0.654 | 6.237 ± 2.451 | 6.257 ± 2.135 | 0.967 |
| Kt/V | 1.231 ± 0.303 | 1.283 ± 0.338 | 0.255 | 1.228 ± 0.331 | 1.343 ± 0.269 | 0.160 | 1.423 ± 0.361 | 1.483 ± 0.290 | 0.265 |
| URR | 57.756 ± 11.699 | 58.471 ± 11.805 | 0.883 | 57.499 ± 12.297 | 57.851 ± 9.547 | 0.964 | 59.420 ± 9.443 | 59.754 ± 9.805 | 0.792 |
| Dialysis.vintage | 67.129 ± 68.668 | 61.228 ± 60.389 | 0.981 | 78.047 ± 75.644 | 53.300 ± 45.215 | 0.308 | 69.626 ± 53.497 | 52.744 ± 33.491 | 0.077 |
| Sex | 0.097 | 0.061 | 0.970 | ||||||
| Female | 94 (50.538) | 21 (36.842) | 33 (38.824) | 13 (65.000) | 45 (39.130) | 16 (37.209) | |||
| Male | 92 (49.462) | 36 (63.158) | 52 (61.176) | 7 (35.000) | 70 (60.870) | 27 (62.791) | |||
| DM | 0.566 | 0.406 | 0.294 | ||||||
| N | 166 (89.247) | 53 (92.982) | 78 (91.765) | 20 (100.000) | 105 (91.304) | 42 (97.674) | |||
| Y | 20 (10.753) | 4 (7.018) | 7 (8.235) | Null | 10 (8.696) | 1 (2.326) | |||
| Hypertension | 0.867 | 0.303 | 0.278 | ||||||
| N | 138 (74.194) | 41 (71.930) | 60 (70.588) | 17 (85.000) | 85 (73.913) | 36 (83.721) | |||
| Y | 48 (25.806) | 16 (28.070) | 25 (29.412) | 3 (15.000) | 30 (26.087) | 7 (16.279) | |||
| Hypotension | 0.581 | 0.583 | 0.235 | ||||||
| N | 138 (74.194) | 45 (78.947) | 67 (78.824) | 14 (70.000) | 84 (73.043) | 36 (83.721) | |||
| Y | 48 (25.806) | 12 (21.053) | 18 (21.176) | 6 (30.000) | 31 (26.957) | 7 (16.279) | |||
| Renal anemia | 0.837 | 0.477 | 0.166 | ||||||
| N | 129 (69.355) | 41 (71.930) | 54 (63.529) | 15 (75.000) | 79 (68.696) | 35 (81.395) | |||
| Y | 57 (30.645) | 16 (28.070) | 31 (36.471) | 5 (25.000) | 36 (31.304) | 8 (18.605) | |||
| ROD | 0.932 | 1.000 | 0.193 | ||||||
| N | 147 (79.032) | 46 (80.702) | 65 (76.471) | 15 (75.000) | 86 (74.783) | 37 (86.047) | |||
| Y | 39 (20.968) | 11 (19.298) | 20 (23.529) | 5 (25.000) | 29 (25.217) | 6 (13.953) | |||
Basic information of subjects.
AG, anion gap; ALB, albumin; ALB/GLB, Albumin/Globulin Ratio; ALP, alkaline phosphatase; ALT, alanine aminotransferase; APTT, activated partial thromboplastin time; AST, aspartate aminotransferase; AST/ALT, Aspartate Aminotransferase/Alanine Aminotransferase Ratio; BASO, basophils; β2-MG, Beta-2, Microglobulin; BNP, B-type Natriuretic Peptide; CK, creatine kinase; Cr, Creatinine; CRP, C-reactive Protein; eGFR, estimated glomerular filtration rate; EOSIN, eosinophils; FDP, fibrin degradation products; FIB, fibrinogen; GLB, globulin; GLU, glucose; Hb, Hemoglobin; GGT, Gamma - Glutamyl Transferase; HbA1c, Hemoglobin A1c; HCT, hematocrit; HDL-C, High-Density Lipoprotein Cholesterol; hsCRP, High-Sensitivity C-reactive Protein; INR, international normalized ratio; LDH, lactate dehydrogenase; LDL-C, Low-Density Lipoprotein Cholesterol; LYM, lymphocytes; MCH, mean corpuscular hemoglobin; MCHC, mean corpuscular hemoglobin concentration; MCV, mean corpuscular volume; MONO, monocytes; MPV, mean platelet volume; NEUT, neutrophils; PA, prealbumin; PCT, procalcitonin; PDW, platelet distribution width; PLT, platelet count; PT, prothrombin time; PTH, parathyroid hormone; RBC, red blood cell count; RDW, red cell distribution width; RET, reticulocytes; TBA, total bile acids; TBIL, total bilirubin; TC, total cholesterol; TCO2, total carbon dioxide; TG, triglycerides; TIBC, Total Iron - Binding Capacity; TnT, Troponin T; TP, total protein; TP, Ab, Total Protein Antibody; TSAT, transferrin saturation; TSH, thyroid stimulating hormone; TT, total testosterone; UA, uric acid; UIBC, Unsaturated Iron - Binding Capacity; VB12, Vitamin B12; WBC, white blood cell count.
3.2 Model performance comparison
To conduct a systematic evaluation of the predictive value of diverse data modalities, a comparison was made regarding the performance of four machine - learning models, namely LR, LightGBM, AdaBoost, and MLP, across three feature configurations. These configurations included clinical variables solely, tongue image features solely, and a fused set integrating both. The performance was evaluated on an independent external test cohort, and the results are presented in Table 2.
TABLE 2
| Model | Accuracy | AUC | 95% CI | Sensitivity | Specificity | Precision | Recall | F1 | Task |
|---|---|---|---|---|---|---|---|---|---|
| LR_clinical | 0.733 | 0.769 | 0.7003–0.8376 | 0.702 | 0.742 | 0.455 | 0.702 | 0.552 | Train |
| LR_clinical | 0.620 | 0.670 | 0.5966–0.7434 | 0.730 | 0.585 | 0.357 | 0.730 | 0.479 | Test |
| LightGBM_clinical | 0.687 | 0.747 | 0.6782–0.8161 | 0.702 | 0.683 | 0.404 | 0.702 | 0.513 | Train |
| LightGBM_clinical | 0.582 | 0.628 | 0.5539–0.7013 | 0.651 | 0.560 | 0.318 | 0.651 | 0.427 | Test |
| AdaBoost_clinical | 0.733 | 0.783 | 0.7203–0.8449 | 0.702 | 0.742 | 0.455 | 0.702 | 0.552 | Train |
| AdaBoost_clinical | 0.696 | 0.682 | 0.6088–0.7560 | 0.524 | 0.750 | 0.398 | 0.524 | 0.452 | Test |
| MLP_clinical | 0.687 | 0.747 | 0.6762–0.8176 | 0.754 | 0.667 | 0.410 | 0.754 | 0.531 | Train |
| MLP_clinical | 0.567 | 0.660 | 0.5866–0.7336 | 0.810 | 0.490 | 0.333 | 0.810 | 0.472 | Test |
| LR_tongue | 0.691 | 0.856 | 0.8048–0.9073 | 0.912 | 0.624 | 0.426 | 0.912 | 0.581 | Train |
| LR_tongue | 0.806 | 0.750 | 0.6690–0.8313 | 0.667 | 0.850 | 0.583 | 0.667 | 0.622 | Test |
| LightGBM_tongue | 0.712 | 0.836 | 0.7785–0.8942 | 0.825 | 0.677 | 0.439 | 0.825 | 0.573 | Train |
| LightGBM_tongue | 0.684 | 0.759 | 0.6867–0.8306 | 0.778 | 0.655 | 0.415 | 0.778 | 0.541 | Test |
| AdaBoost_tongue | 0.778 | 0.876 | 0.8281–0.9248 | 0.895 | 0.742 | 0.515 | 0.895 | 0.654 | Train |
| AdaBoost_tongue | 0.810 | 0.786 | 0.7170–0.8557 | 0.714 | 0.840 | 0.584 | 0.714 | 0.643 | Test |
| MLP_tongue | 0.741 | 0.839 | 0.7863–0.8923 | 0.842 | 0.710 | 0.471 | 0.842 | 0.604 | Train |
| MLP_tongue | 0.719 | 0.683 | 0.5984–0.7670 | 0.667 | 0.735 | 0.442 | 0.667 | 0.532 | Test |
| LR_fused | 0.811 | 0.918 | 0.8787–0.9566 | 0.895 | 0.785 | 0.560 | 0.895 | 0.689 | Train |
| LR_fused | 0.764 | 0.776 | 0.7057–0.8454 | 0.651 | 0.800 | 0.506 | 0.651 | 0.569 | Test |
| LightGBM_fused | 0.811 | 0.873 | 0.8125–0.9336 | 0.842 | 0.801 | 0.565 | 0.842 | 0.676 | Train |
| LightGBM_fused | 0.757 | 0.772 | 0.7038–0.8401 | 0.698 | 0.775 | 0.494 | 0.698 | 0.579 | Test |
| AdaBoost_fused | 0.864 | 0.900 | 0.8506–0.9496 | 0.825 | 0.876 | 0.671 | 0.825 | 0.740 | Train |
| AdaBoost_fused | 0.745 | 0.787 | 0.7216–0.8530 | 0.730 | 0.750 | 0.479 | 0.730 | 0.579 | Test |
| MLP_fused | 0.811 | 0.897 | 0.8447–0.9495 | 0.842 | 0.801 | 0.565 | 0.842 | 0.676 | Train |
| MLP_fused | 0.757 | 0.779 | 0.7073–0.8513 | 0.683 | 0.780 | 0.494 | 0.683 | 0.573 | Test |
Performance of machine learning models by input feature type.
Models relying on tongue image features significantly outperformed those utilizing only clinical variables. The optimal clinical - only model (AdaBoost) attained a moderate AUC of 0.682, with a balanced sensitivity of 52.4% and specificity of 75.0%. However, it exhibited limited overall accuracy (69.6%) and F1-score (0.452). In contrast, the tongue - only AdaBoost model achieved a substantially higher AUC of 0.786, accompanied by enhanced accuracy (81.0%), sensitivity (71.4%), specificity (84.0%), and F1 - score (0.643), which demonstrated the robust discriminative capacity of non - invasive tongue imaging features.
The fusion of clinical and tongue features led to further, albeit marginal, improvements. The fused AdaBoost model achieved the highest AUC in this study (0.787), indicating a slight incremental value from multimodal integration. Nevertheless, this was accompanied by a trade - off in accuracy (74.5%) and F1 - score (0.579), primarily attributable to an increase in false positives. Other fused models (LR, MLP) showed consistent improvements over clinical-only models and modest enhancements in overall discrimination, albeit with a slight trade-off in specificity compared to tongue-only counterparts. For instance, the specificity of LR decreased from 85.0% (tongue-only) to 80.0% (fused), while the AUC increased from 0.750 to 0.776.
Notably, while the training performance was high across all models (e.g., training AUCs >0.85 for tongue and fused models), the test performance declined, particularly for MLP, suggesting some degree of overfitting. This is likely due to the high feature dimensionality relative to the sample size. Despite this, DCA indicated a positive net benefit across clinically relevant risk thresholds (5%–30%) for all tongue - based and fused models, supporting their potential clinical applicability.
In conclusion, tongue image features alone outperformed conventional clinical variables in predicting cardiovascular events in hemodialysis patients. Multimodal fusion provided a small yet consistent improvement in discrimination, with AdaBoost emerging as the most balanced and effective algorithm across all configurations. As depicted in the ROC curves (Figures 2A,B), the performance gap between the training and test sets suggests potential overfitting, which was alleviated through cross - validation.
FIGURE 2
3.3 Performance comparison
As presented in Table 2, the tongue image–based model (Tongue_all) attained a notably higher AUC value of 0.786 in comparison to the clinical model, which had an AUC of 0.682. This finding suggests that tongue features offer incremental predictive value. For example, when combined with tongue features, the AdaBoost model achieved a sensitivity of 71.4%, outperforming the model relying solely on clinical data, which had a sensitivity of 52.4%. This advantage is likely attributable to the tongue’s capacity to capture microvascular alterations that cannot be directly measured by conventional clinical variables.
The Tongue_all model, which combines deep learning and radiomic features, achieved the highest accuracy of 81.0% and an F1 - score of 0.643, with an AUC of 0.786. The Fused model exhibited a slight increase in AUC (0.787), yet demonstrated lower accuracy (74.5%) and specificity (75.0%), indicating a trade - off with a higher rate of false positives. The clinic - only model performed the least effectively, with an AUC of 0.682. DCA and Integrated Discrimination Improvement (IDI) analyses confirmed the superior clinical utility and incremental value of tongue - derived features (Table 3; Figures 2–5).
TABLE 3
| Model | Accuracy | AUC | 95% CI | Sensitivity | Specificity | Precision | Recall | F1 | Cohort |
|---|---|---|---|---|---|---|---|---|---|
| Clinic | 0.733 | 0.783 | 0.7203–0.8449 | 0.702 | 0.742 | 0.455 | 0.702 | 0.552 | Train |
| Tongue_DL | 0.770 | 0.861 | 0.8105–0.9121 | 0.895 | 0.731 | 0.505 | 0.895 | 0.646 | Train |
| Tongue_RAD | 0.840 | 0.878 | 0.8297–0.9273 | 0.789 | 0.855 | 0.625 | 0.789 | 0.698 | Train |
| Tongue_all | 0.778 | 0.876 | 0.8281–0.9248 | 0.895 | 0.742 | 0.515 | 0.895 | 0.654 | Train |
| Fused | 0.864 | 0.900 | 0.8506–0.9496 | 0.825 | 0.876 | 0.671 | 0.825 | 0.740 | Train |
| Clinic | 0.696 | 0.682 | 0.6088–0.7560 | 0.524 | 0.750 | 0.398 | 0.524 | 0.452 | Test |
| Tongue_DL | 0.776 | 0.766 | 0.6930–0.8397 | 0.746 | 0.785 | 0.522 | 0.746 | 0.614 | Test |
| Tongue_RAD | 0.791 | 0.781 | 0.7120–0.8506 | 0.698 | 0.820 | 0.550 | 0.698 | 0.615 | Test |
| Tongue_all | 0.810 | 0.786 | 0.7170–0.8557 | 0.714 | 0.840 | 0.584 | 0.714 | 0.643 | Test |
| Fused | 0.745 | 0.787 | 0.7216–0.8530 | 0.730 | 0.750 | 0.479 | 0.730 | 0.579 | Test |
Performance comparison of clinic-only, tongue-only, and fused models across training and test cohorts.
FIGURE 3
FIGURE 4
FIGURE 5
3.4 Important feature analysis and clinical interpretation
SHAP analysis demonstrated that both tongue texture features (such as wavelet - derived metrics) and clinical biomarkers (especially PT%) were the key determinants of prediction. Elevated values of specific texture features were correlated with an augmented risk, possibly mirroring underlying inflammation or microvascular alterations. In contrast, a higher PT% (signifying superior coagulation function) exerted a protective effect. The substantial influence of texture metrics corroborates the TCM notion that tongue morphology serves as an indicator of systemic health. This interpretability narrows the disparity between model output and clinical comprehension.
4 Discussion
In this prospective, multicenter study, we developed a machine learning framework for predicting cardiovascular events in maintenance hemodialysis patients by integrating quantitative tongue image features with conventional physicochemical indicators. Based on the TCM principle that “the heart opens into the tongue” (Huang and Yuan, 2020), this research combines the ancient diagnostic wisdom with modern data science, transforming a clinical observation into a testable, predictive model. This study serves as a “translational bridge”—not by directly resolving theoretical discrepancies, but by using artificial intelligence (AI) to quantify TCM signs. To our knowledge, this is the first prospective study to demonstrate that objective, quantifiable features of the tongue—assessed through digital imaging—can independently predict adjudicated cardiovascular outcomes in a high - risk renal population with clinically significant performance.
The superior performance of models incorporating tongue features compared to those based solely on clinical variables highlights a crucial gap in current risk stratification: conventional biomarkers, although informative, often fail to capture the complex, non - traditional pathophysiology prevalent in end - stage renal disease. The limited discriminatory ability of clinical - only models is consistent with the well - documented limitations of traditional risk scores such as Framingham (Dufouil et al., 2017) and ASCVD (Grundy et al., 2019) in dialysis populations, where inflammation, autonomic dysfunction, and microvascular disease play disproportionately large roles. The retention of established predictors like age, Ferritin, and creatinine by LASSO selection supports the biological plausibility of the baseline model (Figure 9), yet also emphasizes the necessity for complementary data sources to improve precision.
FIGURE 6
Our approach utilizes the tongue not merely as an anatomical structure, but as a theory - driven, organ - specific biomarker reflecting systemic cardiovascular health (Wang et al., 2018). In TCM, the tongue is regarded as a window to internal organ function, particularly that of the heart and circulation. The strong predictive power of tongue - derived features—especially those extracted by deep learning—suggests that subtle morphological patterns, potentially indicative of microcirculatory impairment or chronic inflammation, are encoded in its visual appearance. While prior studies have associated facial or oral phenotypes with vascular aging (Mizukoshi et al., 2024; Mekić et al., 2021), this research uniquely validates the tongue as a non - invasive, theory - based indicator against hard clinical endpoints in a vulnerable population.
We further demonstrated the value of multimodal integration by combining two complementary sources of tongue information: radiomic features, which capture engineered descriptors of texture and heterogeneity, and deep learning features, which automatically extract hierarchical representations from raw images. The superior performance of the combined “Tongue_all” model indicates that these modalities are not redundant but synergistic. Notably, the strong sensitivity and competitive performance of deep learning features challenge the assumption that handcrafted radiomics are inherently more interpretable or effective, suggesting that data - driven models can detect subtle, clinically relevant patterns beyond the scope of predefined metrics.
The integration of tongue imaging with clinical data led to consistent, incremental improvements in predictive accuracy and clinical net benefit, as verified by decision curve analysis and measures of risk reclassification (IDI,NRI). This synergy likely results from the orthogonality of the data sources: laboratory values reflect systemic biochemical states, while tongue images capture localized, morphological manifestations that may represent early - stage microvascular or inflammatory changes not yet detectable in blood tests. This multimodal strategy enhances risk stratification by providing a more comprehensive view of the patient’s physiological state.
Decision curve analysis (Figures 4A,B) further indicated that the tongue - based model generates net clinical benefits across risk thresholds ranging from 5% to 30%, which validates its efficacy in early risk stratification. Consequently, we propose integrating standardized tongue imaging into routine dialysis evaluations, in conjunction with laboratory markers such as BNP, to facilitate personalized and proactive interventions. SHAP analysis (Figures 7–9) revealed that tongue texture features, such as wavelet - based metrics, are correlated with the risk of cardiovascular events, potentially reflecting the microcirculatory dysfunction and chronic inflammation frequently observed in hemodialysis patients (Sheng et al., 2020). For example, higher texture values are associated with irregularities on the tongue surface, which have been associated with myocardial fibrosis. This provides quantitative evidence for the TCM theory that “the heart opens to the tongue” and showcases the potential of tongue imaging as a quantifiable biomarker.
FIGURE 7
FIGURE 8
FIGURE 9
Several limitations deserve consideration. Although the sample size (n = 506) is reasonable for a multicenter study, the absence of an a priori power analysis and the high dimensionality of tongue imaging features—evidenced by the notable decline in MLP performance on the test set (AUC: 0.683)—raise concerns about overfitting and limit generalizability, highlighting the need for validation in larger, more diverse cohorts. Moreover, the analysis did not stratify patients by TCM syndrome patterns (e.g., Qi deficiency with blood stasis), potentially obscuring important subgroup differences. This study relied solely on single - time - point tongue images due to device availability, whereas longitudinal imaging capturing dynamic changes across the dialysis cycle or in response to treatment could enable real - time monitoring and early intervention. Importantly, tongue imaging is not intended to replace established diagnostics such as cardiac imaging but to complement them; future research should explore multimodal integration—combining tongue features with laboratory data, cardiac imaging, and other TCM diagnostic modalities (e.g., pulse diagnosis, complexion assessment, and symptom patterns)—to further enhance predictive accuracy and clinical utility.
5 Conclusion
In summary, this study introduces a novel artificial intelligence (AI)-driven multimodal framework that integrates quantitative tongue image features with clinical data to predict cardiovascular events in hemodialysis patients. This approach exhibits a significant advantage over unimodal models and demonstrates high clinical utility for risk stratification. Based on the TCM principle that “the heart is externally manifested on the tongue,” this research validates tongue morphology as a non-invasive biomarker that offers supplementary pathophysiological insights beyond conventional laboratory measurements. By combining both radiomic and deep learning-derived tongue features, this study reveals superior performance compared to models utilizing only one type of tongue feature. Future research will concentrate on enhancing model robustness and promoting clinical translation through multicenter validation and prospective studies, thereby laying the foundation for AI-powered, non-invasive precision medicine in nephrology.
Statements
Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Ethics statement
The studies involving humans were approved by The Second Affiliated Hospital of Guangzhou University of Chinese Medicine. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.
Author contributions
KZ: Validation, Visualization, Data curation, Methodology, Software, Writing – review and editing, Writing – original draft, Investigation. FX: Writing – review and editing, Methodology, Writing – original draft, Software. SC: Data curation, Writing – review and editing, Investigation. QW: Investigation, Resources, Writing – review and editing. XH: Writing – review and editing, Resources, Data curation. JW: Project administration, Writing – review and editing, Supervision. LD: Supervision, Writing – review and editing, Resources. KB: Resources, Writing – review and editing, Supervision, Conceptualization, Project administration. WZ: Project administration, Data curation, Supervision, Conceptualization, Writing – review and editing. DZ: Project administration, Writing – review and editing, Data curation, Resources, Validation, Conceptualization, Supervision, Investigation.
Funding
The author(s) declared that financial support was received for this work and/or its publication. This research was funded by the Special Project of the “Peak Building and Apex Creation” Action Plan and the First-level Discipline Capacity Enhancement Project of the “Strengthening the Foundation” Program of Guangzhou University of Chinese Medicine (Grant No. GZY2025GB0930).
Acknowledgments
We express our sincere gratitude to all researchers, research teams, and participants for their contributions to this study. In particular, we extend our special thanks to the physicians at the Guangdong Provincial Hospital of Traditional Chinese Medicine for their invaluable clinical assistance, Yawen Deng for her rigorous verification of the statistical methodology, and the OnekeyAI platform for providing essential technical backing.
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was used in the creation of this manuscript. During the preparation of this work, the authors used Qwen3 Max for language editing. Following the use of this tool, the authors carefully reviewed and revised the content as necessary and take full responsibility for the final published manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
1
BärS.NabetaT.MaaniittyT.SarasteA.BaxJ. J.EarlsJ. P.et al (2024). Prognostic value of a novel artificial intelligence-based coronary computed tomography angiography-derived ischaemia algorithm for patients with suspected coronary artery disease. Eur. Heart J. Cardiovasc Imaging25 (5), 657–667. 10.1093/ehjci/jead339
2
Cardiovascular Disease Group SoP, Chinese Medical Association (2022). Chinese expert consensus on autopsy and molecular diagnosis of sudden cardiac death. Chin. Circulation J.37 (09), 865–875.
3
DuanM.MaoB.LiZ.WangC.HuZ.GuanJ.et al (2024). Feasibility of tongue image detection for coronary artery disease: based on deep learning. Front. Cardiovasc Med.11, 1384977. 10.3389/fcvm.2024.1384977
4
DufouilC.BeiserA.McLureL. A.WolfP. A.TzourioC.HowardV. J.et al (2017). Revised Framingham stroke risk profile to reflect temporal trends. Circulation135 (12), 1145–1159. 10.1161/CIRCULATIONAHA.115.021275
5
FeiL.LiH.ZhangB.LiC.ZouR. (2025). Effect of hybrid blood purification on nutritional status, inflammation, and cardiovascular events in patients with end-stage renal disease. Pak J. Med. Sci.41 (1), 113–118. 10.12669/pjms.41.1.10556
6
GanT.GuanH.LiP.HuangX.LiY.ZhangR.et al (2024). Risk prediction models for cardiovascular events in hemodialysis patients: a systematic review. Seminars Dialysis37 (2), 101–109. 10.1111/sdi.13181
7
GrundyS. M.StoneN. J.BaileyA. L.BeamC.BirtcherK. K.BlumenthalR. S.et al (2019). 2018 AHA/ACC/AACVPR/AAPA/ABC/ACPM/ADA/AGS/APhA/ASPC/NLA/PCNA guideline on the management of blood cholesterol: a report of the American college of cardiology/American heart association task force on clinical practice guidelines. J. Am. Coll. Cardiol.73 (24), e285–e350. 10.1016/j.jacc.2018.11.003
8
HuangD. S.YuanT. H. J. L. (2020). The origin and development of the theory the heart opens into the tongue. Glob. Tradit. Chin. Med.13 (11), 1873–1877.
9
JuanW.PingL.LuZ. (2024). A perspective on medication updates in the '2024 Chinese guideline for the diagnosis and management of heart failure. Her. Med.43 (11), 1718–1722.
10
KnorrM. S.NeyaziM.BremerJ. P.BredereckeJ.OjedaF. M.OhmF.et al (2022). Predicting cardiovascular risk factors from facial and full body photography using deep learning. Eur. Heart J. - Digital Health3 (4), ztac076.2780. 10.1093/ehjdh/ztac076.2780
11
LiF.ChenA.LiZ.GuL.PanQ.WangP.et al (2023). Machine learning-based prediction of cerebral hemorrhage in patients with hemodialysis: a multicenter, retrospective study. Front. Neurol.14, 1139096. 10.3389/fneur.2023.1139096
12
LuoJ.RuiZ.HeY.LiH.YuanY.LiW. (2025). Establishment of a nomogram that predicts the risk of heart failure in hemodialysis patients. Am. Heart J. Plus Cardiol. Res. Pract.49, 100487. 10.1016/j.ahjo.2024.100487
13
MatsubaraY.KimachiM.FukumaS.OnishiY.FukuharaS. (2017). Development of a new risk model for predicting cardiovascular events among hemodialysis patients: population-based hemodialysis patients from the Japan dialysis outcome and practice patterns study (J-DOPPS). PLoS ONE12 (3), e0173468. 10.1371/journal.pone.0173468
14
MeiY. Y.YuX. L.KongM. L.MaC. Y.YC. (2020). Development of a nomogram model for predicting the risk of cardiovascular events in maintenance hemodialysis patients. Chin. J. Blood Purif.19 (2), 5.
15
MekićS.WigmannC.GunnD. A.JacobsL. C.KayserM.SchikowskiT.et al (2021). Genetics of facial telangiectasia in the rotterdam study: a genome-wide association study and candidate gene approach. J. Eur. Acad. Dermatol Venereol.35 (3), 749–754. 10.1111/jdv.17014
16
MerchantR. M.TopjianA. A.PanchalA. R.ChengA.AzizK.BergK. M.et al (2020). Part 1: executive summary: 2020 American heart association guidelines for cardiopulmonary resuscitation and emergency cardiovascular care. Circulation142 (16_Suppl. l_2), S337–s357. 10.1161/cir.0000000000000918
17
MizukoshiK.IwazakiH.IdaT. (2024). Quantitative analysis of age-related changes in vascular structure, oxygen saturation, and epidermal melanin structure using photoacoustic methods. Skin. Res. Technol.30 (1), e13537. 10.1111/srt.13537
18
National Expert Committee on Rational Drug Use NHaFPC (2019). Chinese pharmacist association. Guidelines for rational medication use in thrombolytic therapy of acute ST-segment elevation myocardial infarction. 2 ed. Beijing: People's Medical Publishing House, 74.
19
QinM.YangY.DaiL.DingJ.ZhaY.YuanJ. (2024). Development and validation of a model for predicting the risk of cardiovascular events in maintenance hemodialysis patients. Sci. Rep.14 (1), 6760. 10.1038/s41598-024-55161-y
20
ShengK.ZhangP.YaoX.LiJ.HeY.ChenJ. (2020). Prognostic machine learning models for first-year mortality in incident hemodialysis patients: development and validation study. JMIR Med. Inf.8 (10), e20578. 10.2196/20578
21
SterneJ. A.WhiteI. R.CarlinJ. B.SprattM.RoystonP.KenwardM. G.et al (2009). Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. Bmj338, b2393. 10.1136/bmj.b2393
22
WangX. L.XiaoY. H.CyR. (2018). Correlation analysis between tongue manifestation patterns and certain heart failure risk factors in dialysis patients. Chin. J. Med. Guid.15 (10), 110–113.
23
WangF.ChiS.LiX.ZhangH.LiJ. (2024). A hemodialysis mortality prediction model based on active contrastive learning. Stud. Health Technol. Inf.310, 720–724. 10.3233/shti231059
24
YoshidaK.SaitoT.KatoT.TakezakiT.KatoN.MizobuchiM.et al (2025). Phosphate binder pill burden and cardiovascular events in patients undergoing hemodialysis. Ther. Apher. Dial.29 (3), 333–344. 10.1111/1744-9987.70004
25
YunhuC.MoqingY.LihuaF.XuechunJ.TaoZ.XingyuZ.et al (2023). Mirror-like tongue is an important predictor of acute heart failure: a cohort study of acute heart failure in Chinese patients. J. Traditional Chin. Medicine = Chung I Tsa Chih Ying Wen pan43 (6), 1243–1251. 10.19852/j.cnki.jtcm.20230904.004
26
ZhangA.QiL.ZhangY.RenZ.ZhaoC.WangQ.et al (2022). Development of a prediction model to estimate the 5-year risk of cardiovascular events and all-cause mortality in haemodialysis patients: a retrospective study. PeerJ10, e14316. 10.7717/peerj.14316
27
ZhaoJ.WangQ.WangQ. (2020). 2020 Chinese expert consensus on ventricular arrhythmias (2016 consensus upgrade edition). Chin. J. Cardiac Pacing Electrophysiol.34 (03), 189–253. 10.13333/j.cnki.cjcpe.2020.03.001
Summary
Keywords
cardiovascular events, feature fusion, hemodialysis patients, machine learning, tongue images
Citation
Zou K, Xiao F, Cheng S, Wang Q, He X, Wang J, Dong L, Bao K, Zhou W and Zhao D (2026) Predicting cardiovascular events in hemodialysis patients based on the fusion of physicochemical indicators and tongue images: a prospective and multicenter study. Front. Physiol. 17:1782190. doi: 10.3389/fphys.2026.1782190
Received
06 January 2026
Revised
12 February 2026
Accepted
18 February 2026
Published
04 March 2026
Volume
17 - 2026
Edited by
Feng Gao, The Sixth Affiliated Hospital of Sun Yat-sen University, China
Reviewed by
Song Wen, Shanghai Pudong Hospital, China
Zahraa Tarek, Prince Sattam Bin Abdulaziz University, Saudi Arabia
Updates
Copyright
© 2026 Zou, Xiao, Cheng, Wang, He, Wang, Dong, Bao, Zhou and Zhao.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Daixin Zhao, daixin@gzucm.edu.cn; Kun Bao, baokun@aliyun.com; Wu Zhou, zhouwu@gzucm.edu.cn
†These authors have contributed equally to this work
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.