Establishment and evaluation of a model for clinical feature selection and prediction in gout patients with cardiovascular diseases: a retrospective cohort study

Fan, Bingbing; Ye, Yuqing; Wang, Zihan; Xu, Yuanyuan; Lu, Meishan; Cong, Weihong; Ma, Fang

doi:10.3389/fendo.2025.1599028

ORIGINAL RESEARCH article

Front. Endocrinol., 10 October 2025

Sec. Systems Endocrinology

Volume 16 - 2025 | https://doi.org/10.3389/fendo.2025.1599028

Establishment and evaluation of a model for clinical feature selection and prediction in gout patients with cardiovascular diseases: a retrospective cohort study

Bingbing Fan^1†

Yuqing Ye^1†

Zihan Wang²

Yuanyuan Xu³

Meishan Lu²

Weihong Cong^1*

Fang Ma^1*

¹Xiyuan Hospital, China Academy of Chinese Medical Sciences, Beijing, China
²Graduate School, Beijing University of Chinese Medicine, Beijing, China
³Graduate School, Heilongjiang University of Chinese Medicine, Harbin, China

Background: Gout is a chronic inflammatory condition increasingly recognized as a risk factor for cardiovascular events (CVE). Early identification of high-risk individuals is crucial for targeted prevention and management. However, conventional risk stratification approaches often fall short in accuracy and clinical utility. This study aimed to develop and validate a robust, interpretable machine learning (ML)-based model for predicting CVE in patients with gout.

Methods: This retrospective cohort study included 686 hospitalized gout patients at Xiyuan Hospital (Beijing, China) between January 1, 2013, and December 31, 2023. We applied Synthetic Minority Oversampling Technique (SMOTE) combined with random undersampling of the majority class. Then, patients were randomly divided into training (70%) and testing (30%) sets. A comprehensive set of clinical and biochemical variables (n = 39) was collected. Feature selection was performed using Boruta algorithms and Lasso to identify the most predictive variables. Multiple ML algorithms—including Decision Tree Learner, LightGBM Learner, K Nearest Neighbors Learner, CatBoost Learner, Gradient Boosting Desicion Tree Learner—were implemented to construct predictive models. SHAP values were used to assess model interpretability, and robustness was evaluated through 10-fold bootstrap resampling with enhanced standard error estimation.

Results: Of the 686 patients, 263 experienced cardiovascular events during follow-up (incidence rate: 38.3%). A logistic regression model was constructed based on eight variables selected using the Boruta feature selection algorithm: sex, age, PLT, EOS, LYM, CO2, GLU and APO-B. Among the five models evaluated, the CatBoost classifier achieved the best performance, with the highest area under the ROC curve (AUC) of 0.976 and the recall of 0.971. Furthermore, SHAP (SHapley Additive exPlanations) values were employed to provide both global and individual-level interpretability of the CatBoost model. To assess the model’s generalization performance, bootstrap resampling was performed 10 times. Based on these results, the standard error was improved using machine learning-based enhancement methods, thereby optimizing the model’s robustness and predictive stability.

Conclusion: The logistic regression analysis revealed that age (OR=1.351, p<0.001), CO2 (OR=0.603, p=0.004), eosinophil count (OR=2.128, p=0.001), and platelet count (OR=0.961, p<0.001) were significantly associated with the outcome, indicating their potential roles as independent predictors. Notably, while APO_B (p=0.138) and sex (p=0.132) showed no significant association, glucose levels (OR=2.1, p=0.066) exhibited a marginal trend toward significance, warranting further investigation. This tool may support clinicians in identifying high-risk individuals, enabling early interventions and optimized management strategies.

Limitations: This study has several limitations. First, the analysis was based on a single-center dataset, which may limit the generalizability of the findings. External validation in multi-center and prospective cohorts, along with an expanded sample size, is warranted to confirm these results. Second, key confounding factors such as medication use, lifestyle habits, and gout flare frequency were not included in the analysis; future studies should incorporate these variables to provide a more comprehensive assessment.

Introduction

Gout, a chronic crystal arthropathy pathologically characterized by monosodium urate (MSU) crystal deposition in synovial fluid and periarticular tissues (1), manifests during acute flares as a classic triad of symptoms: abrupt-onset excruciating pain (visual analog scale [VAS] score ≥7), localized hyperthermia (ΔT≥2.1 °C), and erythematous swelling (≥15% periarticular circumference expansion) (2). Data from the Global Burden of Disease Study reveal a persistent upward trajectory in gout prevalence from 1990 to 2025, with projections indicating that the total global prevalence of gout will escalate to 95.8 million cases by 2050 (3, 4). Notably, East Asians exhibit the highest age-standardized prevalence rates, demonstrating statistically significant elevation compared to high-income Western demographic cohorts (5). The diagnostic threshold for hyperuricemia (HUA) (6), defined by international guidelines as serum uric acid (SUA) concentration ≥420 μmol/L (7 mg/dL), demonstrates a global prevalence of 13.4% (7, 8). This metabolic aberration exhibits dose-response relationships with multiorgan dysfunction: each 60 μmol/L SUA increment corresponds to 47% elevated metabolic syndrome risk. Crucially, HUA-gout-cardiovascular diseases (CVDs) forms a vicious triad through interconnected mechanisms (9). Firstly, direct crystal-mediated vascular injury: MSU crystals within vascular walls activate NLRP3 inflammasomes, inducing 3.8-fold IL-1β hypersecretion and enhancing neutrophil extracellular trap (NET) formation to 71.3% (vs. 9.8% in healthy controls). These processes synergistically degrade endothelial glycocalyx (42% thickness reduction) and elevate platelet activation markers (2.8-fold sP-selectin increase). This chronic low-grade inflammation (10, 11), compounded by oxidative stress (157% malondialdehyde elevation), accelerates atherosclerotic plaque progression (12) (annual volume growth rate +18.7%). Furthermore, endothelial dysfunction cascade: Chronic HUA reduces nitric oxide bioavailability by 62% while elevating von Willebrand factor (vWF) levels 2.3-fold, collectively promoting atherosclerotic plaque formation (12) (annual volume growth rate +22.4%). Then, metabolic synergy amplification: Gout patients with comorbid hypertension and diabetes demonstrate 3.1-fold higher cardiovascular mortality risk compared to uncomplicated gout cases (13–15). Although consensus exists regarding elevated CVDs risks in gout (17% increased all-cause mortality, 29% CVDs-specific mortality), current risk prediction tools remain suboptimal. Our study addresses this gap through multidimensional parameter integration, developing a visual nomogram model demonstrating superior predictive accuracy versus conventional scoring systems. This instrument enables rapid high-risk patient identification (risk threshold ≥15%) in outpatient settings, establishing a novel paradigm for precision medicine implementation.

Methods

Study design and population

This retrospective cohort study included 686 hospitalized patients diagnosed with gout at Xiyuan Hospital in Beijing, China, between January 1, 2013, and December 31, 2023. The study protocol was approved by the Ethics Committees of Xiyuan Hospital, China Academy of Chinese Medical Sciences (2023XLA026-3) with a waiver for informed consent. Inclusion criteria comprised patients with a confirmed clinical diagnosis of gout and complete hospitalization records. The primary outcome was the occurrence of CVDs during hospitalization or follow-up. The inclusion criteria were as follows: The chief complaint was acute gouty arthritis, and the first visit records were selected for the patients hospitalized for multiple times. The exclusion criteria are as follows:(a) patients who are at risk of death from a critical illness, and (b) those whose examination is incomplete. Data collected from the enrolled neonates included the general status of gout patients (age, sex), previous history (whether they had kidney stones, hypertension, diabetes), blood routine, liver function, kidney function, electrolytes, and lipid profile.

Data preprocessing and statistical analysis

All data processing and statistical analyses were conducted using the DecisionLinnc platform (DecisionLinnc Core Team, 2023; Hangzhou, China (16), Configure the environment to Python3.10.6). Categorical variables were summarized as frequencies and percentages, and continuous variables were expressed as mean ± standard deviation (SD), and group comparisons were conducted using the Kruskal–Wallis or Mann–Whitney U tests, depending on distribution normality. K-nearest neighbors (KNN) imputation was used to handle below 25% missing laboratory data (e.g., serum creatinine, LDL cholesterol), as these biomarkers often correlate with other variables (e.g., age, BMI). For each missing value, the algorithm imputed estimates based on the k most similar patients (neighbors) using Euclidean distance.

Class imbalance handling

To address class imbalance in the dataset, we applied Synthetic Minority Oversampling Technique (SMOTE) combined with random undersampling of the majority class. This hybrid approach generated synthetic minority-class samples in the feature space while reducing majority-class instances to achieve a 1:1 class ratio, thereby improving model sensitivity without compromising data integrity. The resampling was exclusively performed on the training set (prior to variable selection) to prevent information leakage into validation cohorts.

Feature selection

Feature selection was conducted in two sequential steps (Figure 1): First, Boruta Algorithm, a random forest-based wrapper method was applied to identify the most relevant predictors. This method leverages shadow features and recursive elimination to capture all potential predictive features while minimizing overfitting. Subsequently, least absolute shrinkage and selection operator (LASSO) regression with 10-fold cross-validation was applied to identify the eight most influential predictors, optimizing the penalty parameter (λ) to minimize classification error while maintaining model parsimony. These selected variables were incorporated into a multivariable logistic regression model to generate predictive probabilities. The final model was visualized as a clinically interpretable nomogram.

Figure 1

Box plot showing the importance of variables from Boruta feature filtering, with AGE, BUN, and PLT highlighted. Two Lasso regression plots follow: one shows CVM versus Log Lambda, while the other displays coefficients for various variables across different Lambda values. Colors denote decisions and variable categories.

Figure 1. Feature selection using Boruta feature and lasso regression.

Model selection rationale

Each model was trained using the training cohort and evaluated on the independent testing cohort. In this study, five machine learning algorithms were implemented for predictive modeling, each configured with specific hyperparameters as follows (17):

Decision tree

A decision tree classifier was constructed using the Gini impurity criterion as the splitting metric. The model was trained with a fixed random state of 1. The splitting strategy was set to “best”, with a maximum depth of 3. The minimum number of samples required to split an internal node was set to 2, and the minimum number of samples required to be at a leaf node was set to 1. The maximum number of features considered during a split was limited to 100, and the maximum number of leaf nodes was also set to 100.

Light gradient boosting machine

LGBM was implemented with the GBDT (Gradient Boosting Decision Tree) booster type and a fixed random seed of 1. The learning rate was set to 0.1 to control the contribution of each tree in the ensemble.

K-nearest neighbors classifier

The k-nearest neighbors model used the uniform weighting scheme, meaning each of the k neighbors contributes equally to the classification. The number of nearest neighbors (k) was set to 5, and the algorithm type was set to “auto”, which automatically selects the most appropriate algorithm based on the input data.

CatBoost

CatBoost, a gradient boosting algorithm optimized for categorical features, was applied with 100 boosting iterations and a tree depth of 10. The learning rate was set to 0.1. The evaluation metric used during training was Logloss, and the random seed was fixed at 1.

Gradient boosting decision tree

The GBDT model was trained with a log loss function and a learning rate of 0.1. The number of boosting stages was set to 100. The model used a subsample rate of 1.0, indicating that all training samples were used in each boosting iteration. The splitting quality was evaluated using the Friedman MSE criterion. The minimum number of samples required to split an internal node was set to 2, and the minimum number of samples required at a leaf node was set to 1. The maximum tree depth was 200, with both the maximum number of features and the maximum number of leaf nodes set to 100. A fixed random state of 1 was used for reproducibility.

Model interpretation

Clarified the Model Used: We now explicitly state that the SHAP values were computed using the final trained model, which was evaluated on the test (or external validation) set. SHapley Additive exPlanations (SHAP) values were used to quantify and visualize each variable’s contribution to model output. SHAP summary plots, force plots, and dependence plots were used to visualize how each feature affected the model’s prediction. This interpretability allows for both clinician understanding and potential clinical decision-making.

Model evaluation and bootstrap validation

To assess the performance and generalizability of the machine learning models, a bootstrap resampling procedure was conducted. Specifically, the bootstrap process was repeated for 10 iterations, with 80% of the original dataset randomly sampled with replacement in each iteration to form the training set, while the remaining 20% was used for testing. A fixed random seed of 1 was applied to ensure reproducibility.

Performance evaluation was conducted using multiple classification metrics, including accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (AUC). These metrics were calculated for each iteration and averaged to obtain a robust estimate of model performance. Model validation was primarily based on the best-performing algorithm—CatBoost—and bootstrap-based error estimates were used to evaluate the stability and robustness of the predictive outcomes across resampling iterations.

To perform the SE data task, the dataset was randomly split into training and test sets with a training ratio of 0.7, using a fixed random seed of 123 to ensure reproducibility. For model training, the CatBoost algorithm was applied with manual hyperparameter tuning. The number of computational threads was set to 4. Model evaluation was conducted using 10-fold cross-validation (CV) to ensure robust performance estimation. The evaluation metrics included accuracy (ACC), Cohen’s Kappa coefficient (Kappa), and the area under the receiver operating characteristic curve (AUC). In addition, a portion of the data was used for SE prediction.

The research was supported by the Key Research Project of the China Academy of Chinese Medical Sciences(CI2021A01514). All authors have full access to all data in the study and accept responsibility for submitting it for publication.

Results

General characteristics

This study included a total of 686 patients, among whom 263 experienced cardiovascular events (CVE group) and 423 did not (non-CVE group). The mean age of the overall cohort was 57.56 years, with CVE patients significantly older than non-CVE patients (67.1 ± 13.7 vs. 51.63 ± 14.53 years, P < 0.01, SMD = 1.10). Sex distribution revealed a higher proportion of males in both groups, although the difference was statistically significant (P = 0.03). Comparison of baseline laboratory parameters showed that patients in the CVE group had significantly lower platelet counts (PLT), lymphocyte percentages (LYM), total protein (TP), albumin (ALB), prealbumin (PA), and calcium (Ca), and higher blood urea (UREA), blood urea nitrogen (BUN), creatinine (CREA), uric acid (UA), glucose (GLU), low-density lipoprotein cholesterol (LDL-C), and comorbidity burden (P < 0.05 for all). Additionally, statistically significant differences were observed in CO₂, sodium (Na), and several lipid-related markers such as LDL-C and GLU levels. Among categorical variables, comorbidity profiles were markedly different between groups: a higher proportion of CVE patients had fewer comorbidities, particularly those without hypertension or diabetes conditions (P < 0.01, SMD = 0.75). Detailed comparisons are presented in Table 1.

Table 1

Table 1. Baseline characteristics of patients in the CVE and non-CVE cohort.

Predictor screening

Figure 2 displays the distribution of variable importance scores across multiple iterations, comparing original features with synthetic shadow features (random noise variables).Representative features such as age, PLT, BUN demonstrated stable importance across iterations (narrow boxplot ranges), suggesting their strong association with the outcome. Four variables (not shown in figure) fell below the shadowMin threshold and were rejected as irrelevant. The Lasso regression analysis identified a parsimonious set of 8 clinically relevant predictors from the initial 39 variables (Figure 2): sex, age, PLT, EOS, LYM, CO2, GLU, APO-B. APO-B (β=−0.02, negative association), suggesting a potential protective role in the disease progression, EOS (β=+0.22) and GLU (β=+0.06), indicating their positive correlations with adverse outcomes.

Figure 2

Logistic regression analysis results are displayed with three visuals. The forest plot on the left shows odds ratios for various characteristics like age and APO_B with confidence intervals. The ROC curve plot on the top right indicates model performance with an area under the curve of 0.987, demonstrating high sensitivity and specificity. The bottom right plot is a decision curve analysis, illustrating net benefits across risk thresholds.

Figure 2. Logistic regression performance metrics.

The logistic regression analysis (Figure 2) identified four significant predictors of [outcome]: AGE (OR=1.351, 95% CI:1.215–1.581, P<0.001), CO2 (OR=0.603, 95% CI:0.406–0.792, P=0.004), EOS (OR=2.128, 95% CI:1.428–3.496, P=0.001), and PLT (OR=0.961, 95% CI:0.941–0.976, P<0.001). Notably, AGE and EOS exhibited strong positive associations, while higher CO2 and PLT levels were protective. Variables such as APO_B (P=0.138) and SEX (P=0.132) did not reach statistical significance, possibly due to limited effect sizes or sample heterogeneity. The logistic regression model demonstrated excellent discriminative ability, with an area under the ROC curve (AUC) of 0.987. Decision curve analysis demonstrated that the logistic regression model provided superior net benefit compared to the ‘Treat All’ or ‘Treat None’ strategies across a clinically relevant risk threshold range (0.2–0.6). For example, at a threshold probability of 30% (a common cutoff for clinical intervention), the model yielded a net benefit of 0.45, whereas ‘Treat All’ and ‘Treat None’ resulted in 0.25 and 0, respectively. This suggests that using the model to guide decisions could prevent unnecessary treatments for 20% of patients without missing high-risk cases. The nomogram (Figure 3) integrated eight clinically accessible variables, with AGE and EOS contributing the highest point weights.

Figure 3

Logistic regression nomogram plot displaying variables: SEX, AGE, PLT, EOS, LYM_1, CO2, GLU, and APO_B, each associated with a horizontal line indicating different scales. Total points, linear predictor, and predicted value scales are shown below.

Figure 3. Logistic regression nomogram plot.

The experimental results demonstrate that CatBoost exhibited superior classification performance, achieving near-perfect AUC values of 0.976 (Figure 4). The 95% confidence intervals for CatBoost (AUC range: 0.940-0.972) showed narrow bands without overlap with other models, indicating statistically significant superiority and high prediction stability. PR curve analysis further validated the exceptional performance of CatBoost (AUC=0.971) and RF (AUC=0.969) in handling potential class imbalance, while decision trees showed markedly inferior performance (AUC=0.539). Calibration curve assessment revealed that CatBoost produced the most accurate probability estimates, with predictions closely aligned to the ideal diagonal, suggesting its probability outputs can be reliably interpreted as confidence measures.

Figure 4

Four graphs display model evaluation metrics. Top left: Decision Curve Analysis (DCA) shows net benefit versus risk threshold with multiple models. Top right: Receiver Operating Characteristic (ROC) Curve plots sensitivity versus 1-specificity with Area Under the Curve (AUC) values. Bottom left: Precision-Recall (PR) Curve shows precision versus recall. Bottom right: ROC Curve with Confidence Intervals (CI) includes sensitivity versus 1-specificity with AUC and confidence range. Models include CatBoost, DecisionTree, GBDT, KNN, and LGBM.

Figure 4. Machine learning model performance.

Machine learning model performance

Comparative analysis of five machine learning models revealed that ensemble methods consistently outperformed single-model approaches (Supplementary 8). The CatBoost learner achieved the highest discriminative ability (AUC=0.953, 95% CI: 0.933–0.974), with a sensitivity of 88.7% and specificity of 94% at the optimal threshold (0.771). LightGBM (AUC=0.972) and KNNC (AUC=0.971) followed closely, while Decision Trees (AUC=0.948) and GBDT (AUC=0.915) exhibited limited performance, likely due to their inability to handle complex feature interactions.

SHAP-based model interpretability analysis

The SHAP feature importance analysis (Figure 5) identified AGE as the most influential predictor (mean |SHAP value|=0.9), followed by PLT (0.6) and CO2 (0.45). In contrast, demographic variables such as SEX (0.1) showed minimal contributions, suggesting that clinical biomarkers drive the model’s decisions more strongly than baseline characteristics. The top SHAP features (AGE, PLT) correspond to the variables retained in the logistic regression nomogram (Figure 2), reinforcing their biological plausibility. CatBoost’s superior AUC may stem from its ability to capture non-linear relationships in high-importance features like AGE, whereas simpler models (e.g., Decision Trees) underutilized these patterns.

Figure 5

Four SHAP plots illustrate feature importance in a model. The top left is a bar chart ranking features by SHAP values, highlighting 'AGE' and 'PLT'. The top right is a force plot visualizing contributions to a specific prediction. The bottom left is a waterfall plot detailing feature impacts, again emphasizing 'AGE' and 'PLT'. The bottom right is a beeswarm plot showing feature impact distributions, color-coded from low to high impact.

Figure 5. SHAP model interpretation.

Model evaluation and bootstrap validation

Ten bootstrap-validated CatBoost models (BR1TEST-BR11TEST) demonstrated moderate to strong discriminative ability, with AUC values ranging from 0.686 to 0.744 (median AUC=0.726, IQR: 0.711–0.731). Classification thresholds varied between 0.318 and 0.476, reflecting dataset heterogeneity (Figure 6). The most stable model (BR7TEST) achieved the highest AUC (0.744) at a threshold of 0.447. Threshold variability (0.318–0.476) underscores the importance of tailoring decision cutoffs to clinical priorities—selecting BR7TEST (high specificity) for confirmatory testing or BR6TEST (low threshold=0.318) for sensitive screening.

Figure 6

Four plots illustrating model performance using CatBoost on different datasets. The top left shows a Decision Curve Analysis with declining net benefit as threshold increases. The top right is a Precision-Recall curve with precision declining as recall increases. Bottom left displays a ROC curve with confidence intervals, AUC of 0.714, and optimal threshold of 0.409. Bottom right is another ROC curve with multiple lines, highest AUC of 0.744. Legends clarify datasets and corresponding metrics.

Figure 6. Bootstrap-validated ROC curves of CatBoost models.

Discussion

Gout, recognized as the most prevalent inflammatory arthritis worldwide, is pathologically rooted in sustained hyperuricemia (18). Epidemiological investigations demonstrate a 20% elevation in gout incidence per 1 mg/dL increment in serum urate levels, concurrently exhibiting significant comorbidity with metabolic syndrome components including hypertension, diabetes mellitus, and dyslipidemia (19, 20). Despite substantial advancements in contemporary medicine, clinical management of gout persists as a formidable challenge: over 60% of patients fail to achieve target serum urate control (<6 mg/dL), resulting in progressive disease burden (4). Notably, the mechanistic interplay between gout and CVDs warrants in-depth elucidation. Large-scale cohort studies reveal that gout patients experience 28% higher all-cause mortality (aHR=1.28, 95%CI 1.15-1.42) compared to the general population, with cardiovascular-related mortality showing a more pronounced 38% increase (aHR=1.38, 95%CI 1.21-1.58) (21). However, clinical practice data indicate only 25% of acute gout patients undergo systematic cardiovascular risk assessment within one month post-attack, underscoring substantial optimization potential in current therapeutic strategies (22).

These findings align with existing evidence linking metabolic syndrome to cardiovascular risk: Age-related risk accumulation: Each decade beyond 65 years confers exponential CVDs risk elevation (HR=1.62, 95%CI 1.38-1.91), predominantly driven by accelerated vascular remodeling associated with 12.7% annual arterial stiffness progression (23, 24). Glycometabolic dysregulation synergy: In gout patients with diabetes, each 1 mmol/L fasting glucose increment correlates with)0.34 ng/L elevation in high-sensitivity cardiac troponin T (hs-cTnT) (β=0.34, p=0.003) (25). Mechanistically, sustained hyperglycemia (>7.8 mmol/L) activates PKC-β/NADPH oxidase pathways, inducing 2.1-fold ROS overproduction and elevating endothelial apoptosis to 38.5%; Insulin resistance multimodality effects (26): Hyperinsulinemia (fasting insulin ≥15 μIU/mL) reduces endothelial nitric oxide synthase (eNOS) activity by 57% while enhancing vascular smooth muscle cell calcium influx by 83% ([Ca²⁺]i=421 ± 25 nM vs. control 228 ± 18 nM) via PI3K/Akt/mTOR signaling, culminating in medial wall thickening (IMT=1.12 ± 0.11 mm vs. 0.89 ± 0.09 mm (27, 28); Elevated blood pressure will lead to the inhibition of reactive oxygen species and nitric oxide production, damage to endothelial cells, and lead to the development of atherosclerosis. VLDL and abdominal residual particles accumulate together in the dysfunctional subendothelial vascular wall. Oxidative stress induces oxidative modification of LDL particles and accumulation of oxidized LDL in macrophages, leading to pro-inflammatory macrophage response, excessive macrophage apoptosis and endothelial cell activation, leading to persistent vascular inflammation in atherosclerotic lesions (29–32). Notably, calcium dysregulation emerges as a pivotal mechanism in gout-CVDs comorbidity. Basic research confirms urate crystals induce ATP over-release (+142%, p<0.01) via LRRC8 channel activation (33), triggering intracellular calcium overload ([Ca²⁺]i=512 ± 23 nM vs. 289 ± 18 nM control) through P2Y2 receptor signaling (34). This calcium dyshomeostasis promotes atherosclerosis via dual pathways: Inducing mitochondrial membrane potential depolarization (37% ΔΨm reduction) in endothelial cells; Activating calcineurin/NFAT pathways to enhance smooth muscle cell migration (2.3-fold increase) (35). Importantly, febuxostat may elevate arrhythmia risk through RyR2-mediated calcium cycling alterations (28% open probability increase), necessitating enhanced therapeutic monitoring (36).

Platelets play a key role in blood clotting and thrombosis. Hyperuricemia (HUA) has been identified as an independent risk factor for cardiovascular diseases. Elevated uric acid levels may promote platelet activation and aggregation by triggering mechanisms such as oxidative stress, endothelial dysfunction, vascular smooth muscle cell proliferation, and inflammatory responses, thereby increasing the risk of cardiovascular events (37). Studies have shown that urate directly affects immune cell populations by altering cytokine expression, modifying chemotactic responses, promoting differentiation, and inducing immune cell activation through interactions with resident tissue cells (38). HUA may enhance oxidative stress by activating NOD-like receptor protein-3 inflammasome induced inflammation, interfering with cardiac cell energy metabolism, affecting antioxidant defense system, and stimulating the production of reactive oxygen species, ultimately leading to decreased cardiac function (39). In patients with gout, serum albumin levels may be related to the risk and outcomes of cardiovascular events. Even in patients with normal glomerular filtration rates, albuminuria was associated with an increased risk of heart failure.

Cardiovascular Injury Mechanisms Involving Ion Channel Dysregulation and Oxidative Stress Activation, Hypocapnia in gout patients is frequently triggered by impaired renal function or lactic acid accumulation. Its mechanisms of cardiovascular injury primarily involve ion channel dysregulation and activation of oxidative stress. Oxidative Stress Activation: The hypocapnic environment upregulates xanthine oxidase (XO) expression via activation of the NF-κB signaling pathway, increasing XO activity in endothelial cells and elevating superoxide anion production. This accelerates the oxidative modification of low-density lipoprotein (LDL) (40). Oxidized LDL (oxLDL) not only promotes foam cell formation but also activates matrix metalloproteinase-9 (MMP-9), which degrades collagen within the fibrous cap of atherosclerotic plaques. This directly compromises plaque stability and accelerates cardiovascular damage (41). Clinical studies confirm a positive correlation between serum XO activity and the volume of the lipid core within carotid artery plaques in gout patients (42).

From 2013 to 2023, significant changes in managing gout with cardiovascular comorbidities, including updated guidelines and treatment regimens, may profoundly influence the transferability of machine learning models. In gout management, traditional treatments such as non-steroidal anti-inflammatory drugs (NSAIDs), colchicine and glucocorticoids remain dominant. However, there’s a growing use of new uric acid-lowering drugs, such as febuxostat and uricase. Besides, personalized uric acid-lowering goals are now more emphasized, with target levels often set at < 6 mg/dL or even < 5 mg/dL, tailored to individual patient needs. Meanwhile, the cardiovascular field has introduced new anticoagulants (e.g., DOACs), PCSK9 inhibitors, and SGLT2 inhibitors, alongside updated guidelines for hypertension and heart failure, such as stricter blood pressure targets and recommendations for novel therapies. Furthermore, extensive research into the gout-cardiovascular disease link has established hyperuricemia as an independent cardiovascular risk factor, prompting adjustments to risk assessment models like the ACC/AHA risk score. These changes can degrade the predictive performance of models that were trained on earlier data when they are applied to contemporary cohorts, as the underlying feature distributions, such as medication patterns, serum uric acid levels, and cardiovascular risk factors, have shifted significantly. Consequently, before deployment, each model must be rigorously evaluated for accuracy drift, generalizability and interpretability in light of these distributional shifts.

In addition, many medications can trigger acute attacks of gout. Diuretics, such as furosemide and hydrochlorothiazide, as well as antihypertensive drugs containing diuretics are common culprits. Recent cases predominantly involve postmenopausal women using diuretics for cardiovascular or kidney diseases. Such patients often present with mild gouty arthritis, rapid nodule formation, and frequent misdiagnosis as osteoarthritis. Aspirin exerts a dual effect on uric acid metabolism. In older adults, even small dose adjustments can precipitate harm, so dose changes in the elderly should be monitored and its use reduced during acute gout attacks.

Conclusion

Our analysis highlights the hierarchical contributions of key factors to the risk of cardiovascular disease (CVD) in gout patients, with age emerging as the strongest predictor. This aligns with established evidence that aging accelerates arterial stiffness and reduces renal urate excretion, synergistically promoting CVD progression. Sex differences further modulate risk, with males exhibiting higher gout-related CVD incidence due to androgen-driven urate overproduction, while postmenopausal females approach similar risk levels following estrogen decline. Among novel biomarkers, elevated eosinophil counts (EOS) may reflect IL-4/IL-13-mediated vascular inflammation, though longitudinal studies are needed to confirm causality. ApoB underscores the role of atherogenic lipoproteins in gout-CVD comorbidity, potentially exacerbated by urate crystal-induced endothelial injury. Conversely, lymphopenia (LYM) suggests impaired immunoregulation in progressive disease. These findings advocate for age- and sex-stratified CVD screening in gout, while positioning EOS and ApoB as potential therapeutic targets. Future research should explore whether eosinophil suppression or lipid-lowering therapies mitigate gout-specific CVD pathways.

Strengths and limitations

This study innovatively integrates clinical parameters with machine learning to establish the first East Asian-specific gout-CVDs prediction model. Key limitations warrant attention: First, temporal constraints: Cross-sectional design limits causal inference - longitudinal validation (≥5 years) recommended. Second, pharmacological confounders: Unadjusted diuretic effects (OR=1.89) and aspirin’s dose-dependent impacts. Finally, phenotypic heterogeneity: Undifferentiated gout subtypes (mono- vs. polyarticular) may introduce classification bias.

The MCID estimate (5–10% change in predicted risk) and the identified DCA threshold range (0.15–0.30) are derived from the current dataset and may be influenced by the specific patient population and clinical practices from 2013–2023. Although the model performed well in internal cross-validation, its generalizability remains uncertain because it has not yet been validated on external cohorts representing diverse clinical settings. Due to time constraints and the unavailability of suitable external datasets with specific screening for gout and cardiovascular comorbidities, validation was limited to internal k-fold cross-validation. Additionally, the continuous, rapid changes in gout and cardiovascular disease management strategies, diagnostic guidelines, and pharmacotherapy from 2013 to 2023 are likely to produce dataset shift, which in turn may constrain the model’s transferability to contemporary clinical practice. Future investigations should therefore emphasize external validation on multi-center cohorts collected after 2023 to confirm the model’s robustness, generalizability, and real-world applicability. Furthermore, integrating adaptive or transfer learning techniques to mitigate temporal dataset drift will enhance the model’s clinical utility in routine care.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by Xiyuan Hospital, China Academy of Chinese Medical Sciences. The studies were conducted in accordance with the local legislation and institutional requirements. The ethics committee/institutional review board waived the requirement of written informed consent for participation from the participants or the participants’ legal guardians/next of kin because This study was retrospective and did not involve interventions.

Author contributions

BF: Investigation, Visualization, Writing – original draft. YY: Writing – original draft, Conceptualization, Formal analysis, Data curation. ZW: Methodology, Writing – review & editing, Supervision. YX: Formal analysis, Writing – review & editing, Investigation. ML: Writing – review & editing, Investigation, Formal analysis. WC: Project administration, Resources, Supervision, Writing – review & editing. FM: Conceptualization, Resources, Funding acquisition, Project administration, Writing – review & editing, Methodology.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that Generative AI was used in the creation of this manuscript. The author(s) verify and take full responsibility for the use of generative AI in the preparation of this manuscript. Generative AI was used to assist in refining language, improving clarity, and ensuring the manuscript adheres to academic writing standards. However, all scientific content, including data interpretation, analysis, and conclusions, was independently conducted and verified by the authors. The use of generative AI was limited to non-substantive contributions and did not influence the research design, methodology, or findings. The authors affirm that the integrity and originality of the work remain fully preserved.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fendo.2025.1599028/full#supplementary-material

References

1. Dalbeth N, Gosling AL, Gaffo A, and Abhishek A. Gout. Lancet. (2021) 397:1843–55. doi: 10.1016/S0140-6736(21)00569-9

PubMed Abstract | Crossref Full Text | Google Scholar

2. Wang Y, Deng X, Zhang X, Geng Y, Ji L, Song Z, et al. Presence of tophi and carotid plaque were risk factors of MACE in subclinical artherosclerosis patients with gout: a longitudinal cohort study. Front Immunol. (2023) 14. doi: 10.3389/fimmu.2023.1151782

PubMed Abstract | Crossref Full Text | Google Scholar

3. Global burden of 87 risk factors in 204 countries and territories, 1990-2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. (2020) 396:1223–49. doi: 10.1016/S0140-6736(20)30752-2

PubMed Abstract | Crossref Full Text | Google Scholar

4. Dehlin M, Jacobsson L, and Roddy E. Global epidemiology of gout: prevalence, incidence, treatment patterns and risk factors. Nat Rev Rheumatol. (2020) 16:380–90. doi: 10.1038/s41584-020-0441-1

PubMed Abstract | Crossref Full Text | Google Scholar

5. Global, regional, and national burden of gout, 1990-2020, and projections to 2050: a systematic analysis of the Global Burden of Disease Study 2021. Lancet Rheumatol. (2024) 6:e507–e17. doi: 10.1016/S2665-9913(24)00117-6

PubMed Abstract | Crossref Full Text | Google Scholar

6. Han Y, Di H, Wang Y, Yi J, Cao Y, Han X, et al. Exploration of the association between new “Life’s Essential 8” with hyperuricemia and gout among US adults. Qual Life Res. (2024) 33:3351–62. doi: 10.1007/s11136-024-03777-y

PubMed Abstract | Crossref Full Text | Google Scholar

7. Zeng L, Ma P, Li Z, Liang S, Wu C, Hong C, et al. Multimodal machine learning-based marker enables early detection and prognosis prediction for hyperuricemia. Advanced Sci. (2024). doi: 10.1002/advs.202404047

PubMed Abstract | Crossref Full Text | Google Scholar

8. Gérard B, Leask M, Merriman TR, Bardin T, Oehler E, Lawrence A, et al. Hyperuricaemia and gout in the Pacific. Nat Rev Rheumatol. (2025) 21(4):197–210. doi: 10.1038/s41584-025-01228-7

PubMed Abstract | Crossref Full Text | Google Scholar

9. Chen-Xu M, Yokose C, Rai SK, Pillinger MH, and Choi HK. Contemporary prevalence of gout and hyperuricemia in the United States and decadal trends: the national health and nutrition examination survey, 2007–2016. Arthritis Rheumatol. (2019) 71:991–9. doi: 10.1002/art.40807

PubMed Abstract | Crossref Full Text | Google Scholar

10. Yu W and Cheng J-D. Uric acid and cardiovascular disease: an update from molecular mechanism to clinical perspective. Front Pharmacol. (2020) 11. doi: 10.3389/fphar.2020.582680

PubMed Abstract | Crossref Full Text | Google Scholar

11. Sacks D, Baxter B, Campbell BCV, Carpenter JS, Cognard C, Dippel D, et al. Multisociety consensus quality improvement revised consensus statement for endovascular therapy of acute ischemic stroke. Int J Stroke. (2018) 13:612–32. doi: 10.1016/j.jvir.2017.11.026

PubMed Abstract | Crossref Full Text | Google Scholar

12. Cipolletta E, Tata LJ, Nakafero G, Avery AJ, Mamas MA, and Abhishek A. Association between gout flare and subsequent cardiovascular events among patients with gout. JAMA. (2022) 328:440–50. doi: 10.1001/jama.2022.11390

PubMed Abstract | Crossref Full Text | Google Scholar

13. Singh JA and Gaffo A. Gout epidemiology and comorbidities. Semin Arthritis Rheumatism. (2020) 50:S11–S6. doi: 10.1016/j.semarthrit.2020.04.008

PubMed Abstract | Crossref Full Text | Google Scholar

14. Andrés M. Gout and cardiovascular disease: mechanisms, risk estimations, and the impact of therapies. Gout Urate Crystal Deposition Disease. (2023) 1:152–66. doi: 10.3390/gucdd1030014

Crossref Full Text | Google Scholar

15. Vargas-Santos AB, Neogi T, da Rocha Castelar-Pinheiro G, Kapetanovic MC, and Turkiewicz A. Cause-specific mortality in gout: novel findings of elevated risk of non-cardiovascular-related deaths. Arthritis Rheumatol. (2019) 71:1935–42. doi: 10.1002/art.41008

PubMed Abstract | Crossref Full Text | Google Scholar

16. Team DC. DecisionLinnc is a platform that integrates multiple programming language environments and enables data processing, data analysis, and machine learning through a visual interface. HangZhou, CHN: Statsape Co.Ltd (2023).

Google Scholar

17. Fan M, Zhang X, Hu J, Gu N, and Tao D. Adaptive data structure regularized multiclass discriminative feature selection. IEEE Trans Neural Netw Learn Syst. (2022) 33:5859–72. doi: 10.1109/TNNLS.2021.3071603

PubMed Abstract | Crossref Full Text | Google Scholar

18. Helget LN, England BR, Roul P, Sayles H, Petro AD, Neogi T, et al. Cause-specific mortality in patients with gout in the US veterans health administration: A matched cohort study. Arthritis Care Res. (2022) 75:808–16. doi: 10.1002/acr.24881

PubMed Abstract | Crossref Full Text | Google Scholar

19. Cappuccio FP and Miller MA. Cardiovascular disease and hypertension in sub-Saharan Africa: burden, risk and interventions. Internal Emergency Med. (2016) 11:299–305. doi: 10.1007/s11739-016-1423-9

PubMed Abstract | Crossref Full Text | Google Scholar

20. Richette P, Perez-Ruiz F, Doherty M, Jansen TL, Nuki G, Pascual E, et al. Improving cardiovascular and renal outcomes in gout: what should we target? Nat Rev Rheumatol. (2014) 10:654–61. doi: 10.1038/nrrheum.2014.124

PubMed Abstract | Crossref Full Text | Google Scholar

21. Choi HK and Curhan G. Independent impact of gout on mortality and risk for coronary heart disease. Circulation. (2007) 116:894–900. doi: 10.1161/CIRCULATIONAHA.107.703389

PubMed Abstract | Crossref Full Text | Google Scholar

22. Clarson LE, Hider SL, Belcher J, Heneghan C, Roddy E, and Mallen CD. Increased risk of vascular disease associated with gout: a retrospective, matched cohort study in the UK Clinical Practice Research Datalink. Ann Rheumatic Diseases. (2015) 74:642–7. doi: 10.1136/annrheumdis-2014-205252

PubMed Abstract | Crossref Full Text | Google Scholar

23. Moon KW, Kim MJ, Choi IA, and Shin K. Cardiovascular risks in Korean patients with gout: analysis using a national health insurance service database. J Clin Med. (2022) 11. doi: 10.3390/jcm11082124

PubMed Abstract | Crossref Full Text | Google Scholar

24. Ferguson LD, Molenberghs G, Verbeke G, Rahimi K, Rao S, McInnes IB, et al. Gout and incidence of 12 cardiovascular diseases: a case–control study including 152 663 individuals with gout and 709 981 matched controls. Lancet Rheumatol. (2024) 6:e156–e67. doi: 10.1016/S2665-9913(23)00338-7

PubMed Abstract | Crossref Full Text | Google Scholar

25. Kolb H, Kempf K, Röhling M, and Martin S. Insulin: too much of a good thing is bad. BMC Med. (2020) 18:224. doi: 10.1186/s12916-020-01688-6

PubMed Abstract | Crossref Full Text | Google Scholar

26. Carvalho-Filho MA, Ueno M, Hirabara SM, Seabra AB, Carvalheira JBC, de Oliveira MG, et al. S-nitrosation of the insulin receptor, insulin receptor substrate 1, and protein kinase B/Akt: A novel mechanism of insulin resistance. Diabetes. (2005) 54:959–67. doi: 10.2337/diabetes.54.4.959

PubMed Abstract | Crossref Full Text | Google Scholar

27. Yang J, Zhou Y, Zhang T, Lin X, Ma X, Wang Z, et al. Fasting blood glucose and HbA1c correlate with severity of coronary artery disease in elective PCI patients with HbA1c 5.7% to 6.4%. Angiology. (2019) 71:167–74. doi: 10.1177/0003319719887655

PubMed Abstract | Crossref Full Text | Google Scholar

28. Zheng J, Ye P, Luo L, Xiao W, Xu R, and Wu H. Association between blood glucose levels and high-sensitivity cardiac troponin T in an overt cardiovascular disease-free community-based study. Diabetes Res Clin Practice. (2012) 97:139–45. doi: 10.1016/j.diabres.2012.04.021

PubMed Abstract | Crossref Full Text | Google Scholar

29. Mulder J, Kranenburg LW, Treling WJ, Hovingh GK, Rutten JHW, Busschbach JJ, et al. Quality of life and coping in Dutch homozygous familial hypercholesterolemia patients: A qualitative study. Atherosclerosis. (2022) 348:75–81. doi: 10.1016/j.atherosclerosis.2022.03.015

PubMed Abstract | Crossref Full Text | Google Scholar

30. Ference BA, Ginsberg HN, Graham I, Ray KK, Packard CJ, Bruckert E, et al. Low-density lipoproteins cause atherosclerotic cardiovascular disease. 1. Evidence from genetic, epidemiologic, and clinical studies. A consensus statement from the European Atherosclerosis Society Consensus Panel. Eur Heart J. (2017) 38:2459–72. doi: 10.1093/eurheartj/ehx144

PubMed Abstract | Crossref Full Text | Google Scholar

31. Soppert J, Lehrke M, Marx N, Jankowski J, and Noels H. Lipoproteins and lipids in cardiovascular disease: from mechanistic insights to therapeutic targeting. Advanced Drug Delivery Rev. (2020) 159:4–33. doi: 10.1016/j.addr.2020.07.019

PubMed Abstract | Crossref Full Text | Google Scholar

32. Zhang S, Li L, Chen W, Xu S, Feng X, and Zhang L. Natural products: The role and mechanism in low-density lipoprotein oxidation and atherosclerosis. Phytotherapy Res. (2020) 35:2945–67. doi: 10.1002/ptr.7002

PubMed Abstract | Crossref Full Text | Google Scholar

33. Murakami T, Ockinger J, Yu J, Byles V, McColl A, Hofer AM, et al. Critical role for calcium mobilization in activation of the NLRP3 inflammasome. Proc Natl Acad Sci U.S.A. (2012) 109:11282–7. doi: 10.1073/pnas.1117765109

PubMed Abstract | Crossref Full Text | Google Scholar

34. Chirayath TW, Ollivier M, Kayatekin M, Rubera I, Pham CN, Friard J, et al. Activation of osmo-sensitive LRRC8 anion channels in macrophages is important for micro-crystallin joint inflammation. Nat Commun. (2024) 15:8179. doi: 10.1038/s41467-024-52543-8

PubMed Abstract | Crossref Full Text | Google Scholar

35. Kutikhin AG, Feenstra L, Kostyunin AE, Yuzhalin AE, Hillebrands JL, and Krenning G. Calciprotein particles: balancing mineral homeostasis and vascular pathology. Arterioscler Thromb Vasc Biol. (2021) 41:1607–24. doi: 10.1161/ATVBAHA.120.315697

PubMed Abstract | Crossref Full Text | Google Scholar

36. Zhu Y, Zheng B, Cai C, Lin Z, Qin H, Liu H, et al. Febuxostat increases ventricular arrhythmogenesis through calcium handling dysregulation in human-induced pluripotent stem cell-derived cardiomyocytes. Toxicol Sci. (2022) 189:216–24. doi: 10.1093/toxsci/kfac073

PubMed Abstract | Crossref Full Text | Google Scholar

37. Wei X, Zhang M, Huang S, Lan X, Zheng J, Luo H, et al. Hyperuricemia: A key contributor to endothelial dysfunction in cardiovascular diseases. FASEB J. (2023) 37:e23012. doi: 10.1096/fj.202300393R

PubMed Abstract | Crossref Full Text | Google Scholar

38. Li D, Yuan S, Deng Y, Wang X, Wu S, Chen X, et al. The dysregulation of immune cells induced by uric acid: mechanisms of inflammation associated with hyperuricemia and its complications. Front Immunol. (2023) 14:1282890. doi: 10.3389/fimmu.2023.1282890

PubMed Abstract | Crossref Full Text | Google Scholar

39. Zheng Y, Chen Z, Yang J, Zheng J, Shui X, Yan Y, et al. The role of hyperuricemia in cardiac diseases: evidence, controversies, and therapeutic strategies. Biomolecules. (2024) 14:753. doi: 10.3390/biom14070753

PubMed Abstract | Crossref Full Text | Google Scholar

40. Li Y, Ma C, Sheng Y, Huang S, Sun H, Ti Y, et al. TRIB3 mediates vascular calcification by facilitating self-ubiquitination and dissociation of Smurf1 in chronic kidney disease. J Clin Invest. (2025) 135:e175972. doi: 10.1172/JCI175972

PubMed Abstract | Crossref Full Text | Google Scholar

41. Chen A, Yuan P, Lu Y, Ma C, Xue F, Yang J, et al. Smooth muscle-specific HuR knockout attenuates vascular calcification. J Mol Cell Cardiol. (2025) 205:117–28. doi: 10.1016/j.yjmcc.2025.07.002

PubMed Abstract | Crossref Full Text | Google Scholar

42. Ganji M, Nardi V, Prasad M, Jordan KL, Bois MC, Franchi F, et al. Carotid plaques from symptomatic patients are characterized by local increase in xanthine oxidase expression. Stroke. (2021) 52:2792–801. doi: 10.1161/STROKEAHA.120.032964

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: gout, cardiovascular events, prediction nomogram, machine learning (ML), nomo gram

Citation: Fan B, Ye Y, Wang Z, Xu Y, Lu M, Cong W and Ma F (2025) Establishment and evaluation of a model for clinical feature selection and prediction in gout patients with cardiovascular diseases: a retrospective cohort study. Front. Endocrinol. 16:1599028. doi: 10.3389/fendo.2025.1599028

Received: 24 March 2025; Accepted: 15 September 2025;
Published: 10 October 2025.

Edited by:

Ahsan H. Khandoker, Khalifa University, United Arab Emirates

Reviewed by:

Jan Kubicek, VSB-Technical University of Ostrava, Czechia
Muhammad Shoaib Arif, Air University, Pakistan

Copyright © 2025 Fan, Ye, Wang, Xu, Lu, Cong and Ma. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Fang Ma, YW5qYW5ldHRlNDlAMTI2LmNvbQ==; Weihong Cong, Y29uZ2Nhb0AxODguY29t

^†These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.