Associations between metabolic-inflammatory biomarkers and Helicobacter pylori infection: an interpretable machine learning prediction approach

Zhang, Yue; Duan, Ruifeng; Chen, Xin; Wei, Lijuan

doi:10.3389/fnut.2025.1674585

ORIGINAL RESEARCH article

Front. Nutr., 19 November 2025

Sec. Nutritional Epidemiology

Volume 12 - 2025 | https://doi.org/10.3389/fnut.2025.1674585

Associations between metabolic-inflammatory biomarkers and Helicobacter pylori infection: an interpretable machine learning prediction approach

Yue Zhang

Ruifeng Duan

Xin Chen

Lijuan Wei^*

Department of Gastroenterology and Digestive Endoscopy Center, The Second Hospital of Jilin University, Changchun, China

Background: This study investigated the association between metabolic-inflammatory markers and Helicobacter pylori (HP) infection using interpretable machine learning models, with a focus on the triglyceride-glucose (TyG) index, TyG/HDL-C ratio, and systemic inflammatory biomarkers.

Methods: Data from 2,924 NHANES participants and 1,021 patients from the Second Hospital of Jilin University were analyzed. Associations between metabolic-inflammatory markers and HP were assessed using multivariable regression. Eleven machine learning models were compared for predictive performance, evaluated by AUC, accuracy, sensitivity, specificity, precision, F1 score, and Kappa statistic. Interpretability was assessed via SHAP values, calibration plots, confusion matrices, and decision curve analysis.

Results: In NHANES, the TyG index was independently associated with HP infection (OR = 1.25, 95% CI 1.06–1.48, P = 0.009), and the TyG/HDL-C ratio remained significant after full adjustment (OR = 1.16, 95% CI 1.07–1.25, P < 0.001), while SIRI, IBI, and CRP lost significance. In the external Chinese cohort, the TyG association attenuated (P = 0.057), but higher TyG/HDL-C quartiles remained significant. Among 11 algorithms, Random Forest (RF) and Gaussian Process (GP) achieved the highest AUCs on the training set (both 0.97) but dropped markedly on the validation set (both 0.75), indicating overfitting. In contrast, XGBoost (XGB) and MLP maintained more consistent AUCs between training (0.77) and validation (0.77), reflecting better generalization. DeLong’s test indicated that both RF and XGB significantly outperformed baseline models (P < 0.001), while XGB demonstrated more stable validation performance. Decision curve and SHAP analyses supported the clinical relevance of XGB, highlighting Race and Age as dominant contributors.

Conclusion: The TyG index and TyG/HDL-C ratio were independently associated with HP infection. Among machine learning models, XGBoost demonstrated the most stable and generalizable performance (AUC 0.77 in both training and validation), whereas RF and GP (AUC 0.97 → 0.75) exhibited overfitting. These results suggest that XGB provides a more reliable framework for infection risk prediction, though the cross-sectional design precludes causal inference.

Background

Insulin resistance (IR), defined as a reduced sensitivity and responsiveness to insulin, is closely associated with cardiovascular diseases through mechanisms involving excessive sympathetic nervous system activation, endothelial dysfunction, and chronic inflammation (1–3). Increasing evidence highlights IR as a central driver of these pathophysiological changes (4). Recent studies have further identified the triglyceride-glucose (TyG) index and its derivatives (e.g., the triglyceride to HDL-C ratio) as reliable surrogate markers for IR (5). Calculated using fasting triglyceride and glucose levels, the TyG index provides a simple and reliable estimate of insulin sensitivity (6).

Helicobacter pylori (HP), a Gram-negative bacterium with a global seroprevalence exceeding 50%—particularly prevalent in Asian populations (7) is not only implicated in gastrointestinal diseases (e.g., gastric cancer, chronic gastritis, peptic ulcers), but also in a variety of extra-gastric disorders affecting the nervous and cardiovascular systems (8–10). Growing evidence suggests that HP infection can exacerbate IR by promoting the release of pro-inflammatory cytokines (e.g., IL-6, TNF-α, CRP), activating inflammatory signaling pathways, and inducing systemic chronic inflammatory responses (11–13). Moreover, HP-specific antibodies, such as cytotoxin-associated gene A (CagA), play critical roles in triggering inflammatory cascades and disrupting host metabolic homeostasis (12).

Currently, limited research has investigated the relationship between metabolic-inflammatory markers and HP infection. While elevated TyG levels have been associated with HP seropositivity and increased mortality risk in cross-sectional studies, the role of inflammation as a potential mediator between metabolic dysfunction and HP infection has not yet been explored (14). Moreover, there remains a lack of effective tools for self-assessment of HP infection risk in the general population.

In this study, we hypothesize that metabolic-inflammatory dysregulation may be associated with increased susceptibility to HP infection. To test this hypothesis, we conducted a comprehensive analysis that integrates data from both the NHANES cohort and a Chinese population-based cohort, aiming to: Examine the associations between metabolic-inflammatory indices and HP infection status. Develop a machine learning-based prediction model to facilitate individual-level risk assessment of HP infection. Enable early prevention and intervention strategies through personalized risk profiling. By combining clinical and demographic data with advanced machine learning techniques, this study provides novel insights into the pathophysiological links between metabolic-inflammatory imbalance and HP infection, while offering a practical tool for risk stratification.

Materials and methods

This study utilized data from two independent cohorts. The first data Collection and Definitions: This cross-sectional study included 2,924 adult participants from the 1999–2000 cycle of the National Health and Nutrition Examination Survey (NHANES), with data integrated using the unique “SENQ” identifier assigned to each participant. Exclusion criteria are detailed in Figure 1. All analyses accounted for the complex survey design using the survey package in R. Sample weights, primary sampling units (PSUs), and strata were incorporated to ensure national representativeness consistent with U.S. Census Bureau population estimates. The second cohort comprised 1,021 patients from the Second Hospital of Jilin University. Participants were stratified into quartiles based on their TyG index for subgroup analysis. Continuous variables are presented as weighted means (95% confidence intervals) calculated using Taylor series linearization, and categorical variables are reported as weighted proportions. Missing data were addressed through multiple imputation.

FIGURE 1

Flowchart showing participant selection process. Starts with 9965 participants. Exclusion criteria include missing information on HP, TyG index, inflammatory cells, and follow-up information. Final incorporated studies total 2924.

Figure 1. Flowchart of study participants.

The analysis included participants who underwent serological testing for HP and provided fasting blood samples. The NHANES study protocol was approved by the National Center for Health Statistics (NCHS) Research Ethics Review Board, and all participants provided written informed consent.

Handling of missing and imputed values

Missing data were addressed separately for the two cohorts to reflect differences in sampling design and data completeness:

NHANES cohort: Variables with <10% missingness were imputed using multiple imputation by chained equations (MICE), preserving the complex survey structure. Chinese cohort: Given the smaller sample size and minimal missingness, single imputation was performed using median substitution for continuous variables and mode imputation for categorical variables. Outliers and implausible laboratory values were identified using interquartile range (IQR) inspection and excluded prior to imputation to prevent bias propagation.

This two-tiered strategy ensured data comparability while minimizing distortion of statistical inferences.

Variables and measurements

Laboratory measurements: Fasting serum glucose and triglycerides were measured at the Mobile Examination Center (MEC). Baseline demographic and clinical characteristics were collected, including age, gender, race, obesity, diabetes, smoking, alcohol consumption, and high cholesterol status. The TyG index was calculated as follows (15):

TyG = \ln [Triglycerides (mg / dL) \times Glucose (mg / dL) / 2]

The triglyceride-to-HDL-C ratio (TG/HDL-C) was computed as:

TG / HDL_C = Triglycerides (mg / dL) / HDL - C (mg / dL)

Inflammatory indices were derived at the MEC using a Beckman Coulter HMX hematology analyzer as follows (16):

SIRI = Monocytes \times Neutrophils / Lymphocytes

IBI = CRP \times Neutrophils / Lymphocytes

Serum CRP levels were quantified using latex-enhanced nephelometry (BN ProSpec, Siemens Healthcare). In the Chinese dataset, additional markers were included: gallbladder polyps, renal cysts, hepatic cysts, and MASLD.

Outcome definition

The primary outcome was HP infection, assessed using different diagnostic methods in the two cohorts:

In the NHANES cohort, HP infection status was determined by seropositivity to HP-specific IgG antibodies using standardized laboratory protocols. In the Chinese cohort, HP infection was assessed by the carbon-14 (¹4C) urea breath test (UBT), which detects active HP colonization by measuring labeled CO2 in the exhaled breath. This test is considered a non-invasive and highly specific clinical gold standard for active infection.

Statistical analysis

All statistical analyses were conducted in accordance with guidelines provided by the CDC¹. Given the complex, multistage, stratified probability sampling design of NHANES, analyses incorporated sample weights, clustering, and stratification. Means were used to describe continuous participant characteristics, and proportions were used for categorical variables. Baseline characteristics were described according to TyG quartiles. Homogeneity of variance was tested, followed by Bonferroni post hoc comparisons as appropriate.

Weighted binary logistic regression models were employed to assess the associations of TyG, TyG/HDL-C, SIRI, IBI, and CRP indices with HP seropositivity. Regression results were reported as β coefficients and 95% confidence intervals (CI) for both continuous and quartile-based predictor variables. A P-value < 0.05 was considered statistically significant.

NHANES cohort: Model 1 was unadjusted. Model 2 was adjusted for age, gender, and race/ethnicity. Model 3 was further adjusted for lifestyle factors (smoking and alcohol use) and cardiometabolic comorbidities obesity, diabetes, HPN, and hypercholesterolemia to evaluate independent associations.

Chinese cohort: Model 1: Unadjusted. Model 2: Adjusted for age and gender. Model 3: Further adjusted for age, gender, and key biochemical/metabolic variables, including: Age, Gender, DM, CHOL, Gallbladder.polyps, Renal.cyst, hepatic.cyst, MASLD.

Machine learning models and evaluation strategy

All continuous features were standardized to have zero mean and unit variance, while categorical variables were transformed using one-hot encoding. Missing values were initially handled using median imputation, but we acknowledge that this method may introduce bias. To verify robustness, sensitivity analyses using MICE were performed, yielding consistent model rankings. Features with near-zero variance and highly collinear predictors (|r| > 0.9) were excluded prior to modeling. A unified preprocessing pipeline was consistently applied across all models to ensure fair comparability.

To identify the most informative predictors, we employed Recursive Feature Elimination (RFE) with Random Forest (RF) as the base learner. RFE was performed exclusively within the training set to avoid information leakage and was nested inside a 10-fold cross-validation framework, ensuring that feature ranking was recalculated in each resampling fold. Candidate subset sizes of 5, 10, and all available predictors were evaluated. Model performance, measured by the area under the ROC curve (AUC), guided the feature elimination process. Across folds, a stable set of five variables—Race, Age, TyG, IBI, and CRP—consistently emerged as the most discriminative predictors and was retained for subsequent model development.

To optimize predictive performance, we implemented 11 supervised machine learning algorithms, including RF, Extreme Gradient Boosting (XGB), Neural Networks (NN), Multilayer Perceptron (MLP), Logistic Regression (LR), Naïve Bayes (NB), Support Vector Machine (SVM), C5.0, Gradient Boosting Machine (GBM), K-Nearest Neighbors (KNN), and Gaussian Process (GP). These models were chosen for their complementary strengths in handling non-linear relationships, imbalanced data, and high-dimensional feature spaces.

The dataset was divided into 70% training and 30% testing subsets using stratified sampling to preserve outcome distribution. Within the training data, nested cross-validation was employed to tune hyperparameters and evaluate model robustness. Specifically, a 5-fold inner loop combined grid and random search to optimize hyperparameters based on cross-validated AUC, while an outer 10-fold cross-validation assessed model consistency and minimized overfitting.

Finally, pairwise DeLong tests were applied to ROC curves to evaluate statistical differences in model performance, with AUC serving as the primary performance criterion (17, 18).

Model interpretation and clinical utility

Although RF achieved the highest AUC on the training set, it demonstrated a substantial performance drop on validation, indicating potential overfitting. In contrast, XGB maintained consistent AUC and F1-score across both datasets, supporting its superior generalization and clinical applicability. Therefore, XGB was selected as the final model for interpretation and deployment.

For interpretability, confusion matrices and calibration plots evaluated model reliability, while Decision Curve Analysis (DCA) quantified clinical net benefit against “treat-all” and “treat-none” strategies. SHAP values further elucidated feature contributions, identifying Race and Age as dominant predictors.

Results

Table 1 presents the baseline characteristics of participants stratified by TyG quartiles (weighted analysis). Table 2 presents the baseline characteristics of the Second Hospital of Jilin University participants stratified by TyG quartile. Several potential confounders associated with HP seropositivity showed significant variation across TyG quartiles, including age, sex, race, diabetes, hypertension, obesity, smoking, and alcohol consumption (P < 0.05).

TABLE 1

Table 1. Baseline characteristics according to triglyceride-glucose (TyG) quartiles, weighted.

TABLE 2

Table 2. Baseline characteristics according to triglyceride-glucose (TyG) quartiles.

Association between metabolic-inflammatory markers and HP IgG seropositivity

Table 3 presents the results of the multivariable regression analysis examining the association between metabolic-inflammatory markers and H. pylori (HP) seropositivity (IgG antibodies). In the unadjusted model (Model 1), the TyG index demonstrated a significant positive association with HP infection (OR = 1.66, 95% CI: 1.48–1.86; P < 0.001). This association remained significant but was slightly attenuated in Model 2, which was adjusted for age, gender, smoking, alcohol consumption, and race (OR = 1.17, 95% CI: 1.01–1.36; P = 0.039). The association persisted in Model 3, which included additional adjustments for high cholesterol, obesity, diabetes, and hypertension, though the effect size was further reduced (OR = 1.25, 95% CI: 1.06–1.48; P = 0.009).

TABLE 3

Table 3. Multivariable logistic regression analysis for Helicobacter pylori (HP) infection risk (β, 95% CI) National Health and Nutrition Examination Survey (NHANES).

The TyG/HDL-C ratio, treated as a continuous variable, consistently showed a robust positive association with HP seropositivity across all models (Model 3: OR = 1.16, 95% CI: 1.07–1.25; P < 0.001). When the TyG/HDL-C ratio was categorized into quartiles, the highest quartile (Q4) demonstrated a strong positive association in the crude model (OR = 1.94, P < 0.001), but the association weakened in the adjusted models (Model 2: OR = 1.32, P = 0.035; Model 3: OR = 1.42, P = 0.011). No significant associations were observed for Q2 in any of the models.

In contrast, SIRI showed a significant positive association with HP infection in the crude model (OR = 0.91, P = 0.046), but the association was no longer significant in the adjusted models or when categorized into quartiles. IBI and CRP exhibited similar patterns, with significant associations in the crude model but a lack of significance after adjustment and in quartile-based analysis.

Table 4 presents the associations between HP infection and metabolic-inflammatory markers in patients from the Second Hospital of Jilin University. In the crude model (Model 1), the TyG index (continuous) showed a significant positive association with HP infection (OR = 2.35, 95% CI: 1.85–2.98; P < 0.001). The association remained significant but was attenuated in the adjusted models. In Model 2, adjusted for age and gender, the odds ratio decreased (OR = 1.90, 95% CI: 1.48–2.42; P < 0.001), and in Model 3, further adjusted for diabetes, cholesterol levels, gallbladder polyps, renal cysts, hepatic cysts, and MASLD, the association was no longer significant (OR = 1.32, 95% CI: 0.99–1.75; P = 0.057).

TABLE 4

Table 4. Multivariable logistic regression analysis for Helicobacter pylori (HP) infection risk (β, 95% CI) (China).

For TyG quartiles, the highest quartile (Q4) exhibited a strong positive association in Model 1 (OR = 3.82, P < 0.001), but this association weakened in Model 2 (OR = 2.71, P < 0.001) and became non-significant in Model 3 (OR = 1.53, P = 0.066). The Q3 quartile showed a significant positive association in all models (Model 1: OR = 2.25, P < 0.001; Model 2: OR = 1.81, P < 0.001; Model 3: OR = 1.72, P = 0.005), while Q2 did not show a significant association in any of the models (P-values ranging from 0.056 to 0.491).

The TyG/HDL-C ratio (continuous) also demonstrated a positive association with HP infection, with a strong effect in the crude model (OR = 1.61, 95% CI: 1.39–1.86; P < 0.001) and Model 2 (OR = 1.42, 95% CI: 1.23–1.65; P < 0.001). However, this association was not significant in Model 3 (OR = 1.15, 95% CI: 0.98–1.35; P = 0.098). For TyG/HDL-C quartiles, the highest quartile (Q4) showed a strong positive association in the crude model (OR = 3.86, P < 0.001), which decreased but remained significant in Model 2 (OR = 2.86, P < 0.001), and was still significant in Model 3 (OR = 1.78, P = 0.010). The second (Q2) and third (Q3) quartiles showed varying results, with Q3 showing a significant association in Model 2 (OR = 2.44, P < 0.001) and Model 3 (OR = 2.25, P < 0.001), while Q2 had a weaker or non-significant association across models.

Machine learning

For SIRI, the continuous variable showed a significant positive association with HP infection in the crude model (OR = 1.45, 95% CI: 1.03–2.03; P = 0.031), but this association became weaker and non-significant in the adjusted models (Model 2: OR = 1.33, P = 0.087; Model 3: OR = 1.18, P = 0.305). In SIRI quartiles, the second quartile (Q2) showed a significant positive association in all models (Model 1: OR = 1.86, P < 0.001; Model 2: OR = 1.64, P = 0.008; Model 3: OR = 2.14, P < 0.001). The third quartile (Q3) was significant in Model 1 (OR = 1.65, P = 0.005) and Model 2 (OR = 1.48, P = 0.034), but not in Model 3 (OR = 1.24, P = 0.294). The fourth quartile (Q4) also showed a significant association in the crude and adjusted models (Model 1: OR = 1.92, P < 0.001; Model 2: OR = 1.64, P = 0.008), but the association became borderline significant in Model 3 (OR = 1.47, P = 0.065).

RF and GP performed the best on the training dataset, with RF achieving an AUC of 0.97, Recall of 0.71, and Precision of 0.95, while GP showed similar performance (Table 5 and Figure 2). However, both models exhibited significant drops in performance on the test dataset (AUC: 0.75), indicating a potential overfitting risk due to capturing noise in the training data (Supplementary Table 1). In contrast, XGB demonstrated more consistent performance across both datasets, with strong Precision (0.66) and F1 (0.56), making it a more reliable model for generalization. To assess statistical significance between model performances, pairwise DeLong tests were conducted on ROC curves. These comparisons (Supplementary Table 2) showed that both RF and XGB significantly outperformed baseline models (P < 0.001). Notably, RF and XGB differed significantly from each other (P < 0.001), while differences between other top models (e.g., RF vs. GP, XGB vs. MLP) were not statistically significant.

TABLE 5

Table 5. Model performance table for 11 models.

FIGURE 2

Two side-by-side ROC curves compare various machine learning models. Chart A shows higher AUC values, with GP and RF reaching 0.97, while C5.0 has the lowest at 0.67. Chart B shows slightly lower AUC values, with GP at 0.75 and C5.0 at 0.69. The curves depict sensitivity vs. 1-specificity.

Figure 2. Receiver operating characteristic curves for 11 models. (A,B) (training and testing) Display the receiver operating characteristic (ROC) curves of Performance of 11 machine-learning models on the training and test datasets: ROC curves. Models include NN, SVM, MLP, GBM, LR, NB, XGB, C5.0, GP, KNN, and RF. The horizontal axis is 1-Specificity, and the vertical axis is Sensitivity. The closer the curve is to the upper left corner, the better the model’s classification performance.

Figure 3 illustrates the performance of the XGB model through confusion matrices (Figures 3A, B) and calibration plots (Figures 3C, D) for the training and validation datasets, respectively.

FIGURE 3

Confusion matrices and calibration plots. Panel A shows a confusion matrix with high values for true negatives and false positives. Panel B displays a matrix with varied distribution between categories. Panels C and D present calibration plots; both illustrate the relationship between predicted values on the x-axis and observed values on the y-axis, with data points and confidence intervals straddling the diagonal line.

Figure 3. (A,B) (training and testing) Display the confusion matrices of the extreme gradient boosting (XGB) model. (C,D) (training and testing) Show the calibration plots of the XGB model. The horizontal axis (X-axis) represents the probability of illness predicted by the model, and the vertical axis (Y-axis) represents the actual observed proportion of illness.

Figures 3A, B present the confusion matrices for the training and validation datasets, respectively. In the training set (Figure 3A), the model correctly identified 1,299 true negatives and 395 true positives, with 194 false positives and 451 false negatives. In the validation set (Figure 3B), the model correctly classified 301 true negatives and 92 true positives, while misclassifying 64 false positives and 128 false negatives. Compared to the training set, the performance in the validation set demonstrates a notable decrease in sensitivity and overall classification accuracy, indicating potential overfitting of the model to the training data (Figure 3).

Figures 3C, D the calibration plots for the training and validation sets, respectively. The X-axis represents the predicted probability of disease, and the Y-axis reflects the observed event frequency. In the training set (Figure 3C), the calibration curve closely aligns with the ideal diagonal line, indicating good agreement between predicted probabilities and actual outcomes. In contrast, the calibration plot for the validation set (Figure 3D) shows increased deviation from the diagonal line and broader confidence intervals, suggesting reduced calibration performance and increased predictive uncertainty when applied to external data (Figure 3).

We used DCA to assess clinical utility of the XGB model. In DCA, the threshold probability (p_t) is the minimum predicted risk at which a clinician would act; it encodes the trade-off between false negatives and false positives [relative harm ≈ p_t/(1−p_t)]. Thus, the range 0.10–0.50 corresponds to clinically plausible scenarios where the harm of an unnecessary intervention is between 1:9 and 1:1 compared with missing a true case—i.e., typical “moderate-risk” decisions such as initiating preventive therapy or ordering confirmatory testing.

As shown in Figure 4, the XGB curve lies above both reference strategies—Treat-None and Treat-All—through 0.10–0.50 on the training set (Figure 4A), and remains superior through approximately 0.10–0.60 on the validation set (Figure 4B), though with a smaller margin. Practically, if a clinician’s action threshold falls in these ranges, using XGB would yield greater net benefit than existing default strategies, implying fewer unnecessary interventions for a similar or higher number of detected true cases.

FIGURE 4

Two line graphs labeled A and B compare net benefit against threshold probability. Each graph has three lines: “None” (black), “All” (gray), and “pred” (dashed). Graph A shows “pred” declining between the “None” and “All” lines, while Graph B shows “pred” intersecting “All” around 0.6 threshold probability.

Figure 4. Decision curve analysis (DCA) of the extreme gradient (XGB) model. (A) Displays the decision curve for the training set, while (B) shows the decision curve for the testing set. The horizontal axis (X-axis) represents the threshold probability—i.e., the predicted probability above which a subject is classified as positive and considered for intervention. The vertical axis (Y-axis) denotes the net benefit, which accounts for the trade-off between true positives and false positives at each threshold. The “All” line represents a strategy in which all patients are assumed to receive treatment, regardless of risk, while the “None” line reflects a strategy where no patients receive treatment. A model is considered clinically useful if it yields a higher net benefit than both the “All” and “None” strategies across a reasonable range of threshold probabilities.

In this study, we used two interpretability methods, LIME and SHAP, to analyze the feature importance and their impact on the model’s prediction for a given instance (Figure 5).

FIGURE 5

Two panels comparing feature importance. Panel A: Bar chart showing LIME explanations with race and age as significant features, having positive weights, while IBI, TyG, and CRP have smaller weights. Panel B: Violin plot displaying SHAP values for the same features, with color gradients indicating feature value magnitudes from low (purple) to high (orange).

Figure 5. Feature importance and SHapley additive exPlanations (SHAP) values. (A) LIME explanation showing the feature weights for each attribute that contributed to the model’s prediction. (B) SHAP summary plot, displaying the distribution of SHAP values for each feature across all instances. Color bar: represents the feature values from low (purple) to high (yellow) for each feature.

Figure 5A revealed that, for the most influential features were Race and Age. Race had a substantial positive impact on the prediction with a feature weight of approximately +0.3, indicating that this feature significantly contributed to the positive prediction. Age also played an important role, contributing positively but with a smaller weight compared to Race. Other features such as IBI, TyG, and CRP showed minimal contributions, with TyG and CRP having near-zero or slightly negative weights. These results suggest that for this specific instance, Race and Age were the primary driving factors influencing the model’s prediction.

Figure 5B provided a broader view of the feature importance across all instances. Race emerged as the most influential feature, with a wide SHAP value distribution, indicating its significant and varying impact on the model’s predictions across different samples. Age was also a critical feature, with higher ages (indicated by yellow color) pushing the prediction toward positive outcomes. IBI and TyG had moderate effects, while CRP showed the least impact. The SHAP values demonstrated that Race and Age are the key features in driving the model’s output, with Race being the dominant factor.

Discussion

This study examined the associations between metabolic–inflammatory markers and HP infection using data from two distinct populations: the NHANES cohort and a Chinese hospital-based sample. Our findings provide new insights into the metabolic–inflammatory dysregulation underlying HP infection and illustrate how interpretable machine learning can help uncover potential biological and contextual patterns related to infection risk.

Insulin resistance is increasingly recognized as a metabolic component linking chronic inflammation and infection. The TyG index, a validated surrogate for IR, has been associated with several chronic diseases, including non-alcoholic fatty liver disease, diabetes, and kidney dysfunction (19–22). Prior studies have also reported positive correlations between HP antibody levels and the TyG index (23). In our analysis, the TyG/HDL-C ratio consistently showed a positive association with HP infection in both cohorts, even after adjusting for demographic and metabolic confounders, supporting its role as a robust indicator of metabolic stress and inflammation.

HP infection is known to trigger chronic gastritis and systemic inflammation through the release of proinflammatory cytokines such as IL-1, IL-6, IL-8, and TNF-α (24–26). These inflammatory cascades disrupt metabolic homeostasis, contributing to insulin resistance, lipid abnormalities, and cardiovascular comorbidities. Elevated TyG index values—reflecting higher triglyceride-glucose burden—often coincide with increased inflammatory activity and greater cardiometabolic risk (16). In contrast, SIRI, IBI, and CRP demonstrated weaker and inconsistent associations, suggesting that traditional inflammatory markers may be more influenced by demographic or environmental factors than by direct metabolic pathways related to HP infection.

Our study aligns with these recommendations by combining interpretable machine learning with validated biochemical indices, promoting both predictive insight and reproducible inference in metabolic–infectious disease research. Such integrative frameworks could clarify how nutrient patterns and metabolic stress jointly shape systemic inflammation and infection susceptibility, advancing both precision nutrition and infection prevention strategies.

Differences between the NHANES and Chinese cohorts underscore the importance of contextual factors. The attenuation of associations in the Chinese cohort may reflect variations in diet, HP strain distribution, healthcare access, or socioeconomic determinants. Interpretability analyses (SHAP and LIME) identified race and age among the most influential predictors of HP infection. However, the role of race in this context should not be interpreted as a biological determinant. Instead, it likely captures social, dietary, and environmental disparities—including differences in sanitation, nutrition, healthcare access, and socioeconomic status—that co-vary with self-reported racial categories. This perspective aligns with recent findings that HP seroprevalence varies among U.S. racial and ethnic groups largely due to environmental and socioeconomic factors (27). This framing aligns with growing consensus in epidemiologic research that race serves as a contextual proxy for social and environmental exposures, rather than reflecting innate biological susceptibility. Recognizing this distinction is critical for preventing misinterpretation of statistical associations and for guiding public-health interventions that address structural and lifestyle factors influencing infection risk (28, 29).

Feature selection via Recursive Feature Elimination (RFE) identified Age, Race, TyG, IBI, and CRP as key predictors (Supplementary Table 3). Among 11 algorithms, Random Forest (RF) and Gaussian Process (GP) achieved high training performance (AUC = 0.97) but showed substantial overfitting in validation (AUC = 0.75). XGBoost (XGB), by contrast, maintained consistent AUCs (0.77 in both sets), demonstrating better generalization and potential clinical applicability. DeLong tests confirmed that both RF and XGB significantly outperformed baseline models (P < 0.001), while RF and XGB differed significantly (P < 0.001), indicating that XGB provided superior discriminative stability.

These findings support prior work [e.g., Wang et al. (29)] suggesting that integrating interpretable ML approaches with clinical biomarkers can bridge mechanistic understanding and predictive modeling, helping to generate hypotheses about how metabolic–inflammatory pathways and social context interact to influence infection risk.

Given the global burden of HP infection, our findings highlight the translational potential of using TyG-based indices—derived from routine biochemical tests—as cost-effective markers for risk stratification. The XGB model provided meaningful clinical benefit across realistic decision thresholds (0.1–0.5), supporting its utility for targeted screening in primary care or community settings. When embedded in electronic health systems, such models could enable personalized infection risk estimation and guide resource allocation in high-prevalence or low-resource regions.

Notably, most conventional AI and machine learning models for HP diagnosis have relied on endoscopic or histopathological image analysis to identify mucosal lesions and infection patterns. In contrast, our approach uses basic demographic information and metabolic–inflammatory indices to predict infection risk, enabling a non-invasive, low-cost, and generalizable diagnostic alternative suitable for population-level screening (30, 31). Such models may complement imaging-based AI systems by providing biochemical and physiological insights into host–pathogen interactions. Furthermore, chronic HP infection and its eradication therapy can markedly influence nutritional and metabolic status by altering the gastrointestinal microenvironment. Multiple trials have examined how HP eradication affects body weight, BMI, blood pressure, nutritional markers, and lipid profiles (including total cholesterol, LDL-C, HDL-C, and triglycerides), as well as serological nutrition biomarkers such as apolipoproteins (ApoC-II, ApoC-III) (32). These findings remain inconsistent but collectively suggest that HP infection may impact nutrient absorption and systemic metabolism, thereby influencing long-term health outcomes and even life expectancy.

Our study aligns with current recommendations advocating for rigorous, interpretable AI models combined with validated biochemical indices to enhance reproducibility in metabolic–infectious disease research. Such integrative frameworks could clarify how diet-related metabolic stress and inflammation jointly shape infection susceptibility, advancing precision prevention and individualized metabolic assessment strategies.

Limitations and future directions

This study has several limitations. The cross-sectional design precludes causal inference regarding the directionality between metabolic alterations and HP infection. Despite multivariable adjustment, residual confounding from unmeasured variables (e.g., diet, HP strain, genetics) may persist. Moreover, the external validation was performed in a single-center cohort with limited sample size, which may restrict generalizability. Future longitudinal, multi-ethnic studies enriched with environmental and behavioral data are needed to test the temporal and mechanistic hypotheses proposed here. Additionally, improving model calibration across populations will enhance predictive reliability and clinical applicability.

Conclusion

In summary, our findings suggest that metabolic–inflammatory dysregulation, reflected by the TyG index and TyG/HDL-C ratio, is significantly associated with HP infection across populations. These results advance the hypothesis that metabolic abnormalities and systemic inflammation jointly contribute to infection susceptibility. By integrating interpretable machine learning with epidemiologic analyses, this study offers a data-driven framework for hypothesis generation regarding metabolic–immune interactions in chronic infection. While causality cannot be inferred, the consistency of associations across cohorts supports the potential utility of metabolic–inflammatory markers for personalized risk assessment. Future longitudinal and mechanistic studies are warranted to validate these hypotheses and explore the bidirectional relationship between metabolic dysfunction and HP infection.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

YZ: Writing – original draft. RD: Writing – review & editing. XC: Writing – review & editing. LW: Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Acknowledgments

We thank all participants and staff from the NHANES database.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnut.2025.1674585/full#supplementary-material

Abbreviations

HP, Helicobacter pylori; TyG, triglyceride-glucose; IR, insulin resistance; SIRI, systemic immune-inflammation index; IBI, inflammatory burden index; HPN, hypertension; NHANES, National Health and Nutrition Examination Survey; PSUs, primary sampling units; NCHS, National Center for Health Statistics; MEC, Mobile Examination Center; CI, confidence interval; AUC, area under the curve; ROC, receiver operating characteristic; SHAP, SHapley additive exPlanations; XGB, extreme gradient boosting; GBM, gradient boosting machine.

Footnotes

1. ^https://wwwn.cdc.gov/nchs/nhanes/tutorials/default.aspx

References

1. Faerch K, Vaag A, Holst JJ, Hansen T, Jørgensen T, Borch-Johnsen K. Natural history of insulin sensitivity and insulin secretion in the progression from normal glucose tolerance to impaired fasting glycemia and impaired glucose tolerance: the Inter99 study. Diabetes Care. (2009) 32:439–44. doi: 10.2337/dc08-1195

PubMed Abstract | Crossref Full Text | Google Scholar

2. Louie JZ, Shiffman D, McPhaul MJ, Melander O. Insulin resistance probability score and incident cardiovascular disease. J Intern Med. (2023) 294:531–5. doi: 10.1111/joim.13687

PubMed Abstract | Crossref Full Text | Google Scholar

3. Lin Z, Yuan S, Li B, Guan J, He J, Song C, et al. Insulin-based or non-insulin-based insulin resistance indicators and risk of long-term cardiovascular and all-cause mortality in the general population: a 25-year cohort study. Diabetes Metab. (2024) 50:101566. doi: 10.1016/j.diabet.2024.101566

PubMed Abstract | Crossref Full Text | Google Scholar

4. Jonk AM, Houben AJ, de Jongh RT, Serné EH, Schaper NC, Stehouwer CD. Microvascular dysfunction in obesity: a potential mechanism in the pathogenesis of obesity-associated insulin resistance and hypertension. Physiology. (2007) 22:252–60. doi: 10.1152/physiol.00012.2007

PubMed Abstract | Crossref Full Text | Google Scholar

5. Oliveri A, Rebernick RJ, Kuppa A, Pant A, Chen Y, Du X, et al. Comprehensive genetic study of the insulin resistance marker TG:hdl-c in the UK Biobank. Nat Genet. (2024) 56:212–21. doi: 10.1038/s41588-023-01625-2

PubMed Abstract | Crossref Full Text | Google Scholar

6. Su J, Li Z, Huang M, Wang Y, Yang T, Ma M, et al. Triglyceride glucose index for the detection of the severity of coronary artery disease in different glucose metabolic states in patients with coronary heart disease: a RCSCD-TCM study in China. Cardiovasc Diabetol. (2022) 21:96. doi: 10.1186/s12933-022-01523-7

PubMed Abstract | Crossref Full Text | Google Scholar

7. Kotilea K, Bontems P, Touati E. Epidemiology, diagnosis and risk factors of Helicobacter pylori infection. Adv Exp Med Biol. (2019) 1149:17–33. doi: 10.1007/5584_2019_357

PubMed Abstract | Crossref Full Text | Google Scholar

8. Gravina AG, Zagari RM, De Musis C, Romano L, Loguercio C, Romano M. Helicobacter pylori and extragastric diseases: a review. World J Gastroenterol. (2018) 24:3204–21. doi: 10.3748/wjg.v24.i29.3204

PubMed Abstract | Crossref Full Text | Google Scholar

9. Graves KL, Vigerust DJ. Hp: an inflammatory indicator in cardiovascular disease. Future Cardiol. (2016) 12:471–81. doi: 10.2217/fca-2016-0008

PubMed Abstract | Crossref Full Text | Google Scholar

10. Tong L, Wang BB, Li FH, Lv SP, Pan FF, Dong XJ. An updated meta-analysis of the relationship between Helicobacter pylori infection and the risk of coronary heart disease. Front Cardiovasc Med. (2022) 9:794445. doi: 10.3389/fcvm.2022.794445

PubMed Abstract | Crossref Full Text | Google Scholar

11. Papamichael KX, Papaioannou G, Karga H, Roussos A, Mantzaris GJ. Helicobacter pylori infection and endocrine disorders: Is there a link? World J Gastroenterol. (2009) 15:2701–7. doi: 10.3748/wjg.15.2701

PubMed Abstract | Crossref Full Text | Google Scholar

12. Asaoka D, Nagahara A, Hojo M, Sasaki H, Shimada Y, Yoshizawa T, et al. The Relationship between H. pylori infection and osteoporosis in Japan. Gastroenterol Res Pract. (2014) 2014:340765. doi: 10.1155/2014/340765

PubMed Abstract | Crossref Full Text | Google Scholar

13. Wroblewski LE, Peek RM, Wilson KT. Helicobacter pylori and gastric cancer: factors that modulate disease risk. Clin Microbiol Rev. (2010) 23:713–39. doi: 10.1128/CMR.00011-10

PubMed Abstract | Crossref Full Text | Google Scholar

14. Zhu XY, Xiong YJ, Meng XD, Xu HZ, Huo L, Deng W. Association of triglyceride-glucose index with helicobacter pylori infection and mortality among the US population. Diabetol Metab Syndr. (2024) 16:187. doi: 10.1186/s13098-024-01422-9

PubMed Abstract | Crossref Full Text | Google Scholar

15. Dang K, Wang X, Hu J, Zhang Y, Cheng L, Qi X, et al. The association between triglyceride-glucose index and its combination with obesity indicators and cardiovascular disease: Nhanes 2003-2018. Cardiovasc Diabetol. (2024) 23:8. doi: 10.1186/s12933-023-02115-9

PubMed Abstract | Crossref Full Text | Google Scholar

16. Huang Y, Zhou Y, Xu Y, Wang X, Zhou Z, Wu K, et al. Inflammatory markers link triglyceride-glucose index and obesity indicators with adverse cardiovascular events in patients with hypertension: insights from three cohorts. Cardiovasc Diabetol. (2025) 24:11. doi: 10.1186/s12933-024-02571-x

PubMed Abstract | Crossref Full Text | Google Scholar

17. Zhang YZ, Wu HY, Ma RW, Feng B, Yang R, Chen XG, et al. Machine Learning-Based predictive model for adolescent metabolic syndrome: utilizing data from NHANES 2007-2016. Sci Rep. (2007) 15:3274. doi: 10.1038/s41598-025-88156-4

PubMed Abstract | Crossref Full Text | Google Scholar

18. Lundberg SM, Lee S. A unified approach to interpreting model predictions. Proceedings of the Advances in Neural Information Processing Systems (NIPS). Denver: (2017). p. 4765–74.

Google Scholar

19. Andrade LJO, Oliveira LM, Bittencourt AMV, Baptista GM, Oliveira GCM. Association between nonalcoholic fatty pancreatic disease and triglyceride/glucose index. Arq Gastroenterol. (2023) 60:345–9. doi: 10.1590/S0004-2803.230302023-44

PubMed Abstract | Crossref Full Text | Google Scholar

20. Amzolini AM, Forţofoiu MC, Barău Abu-Alhija A, Vladu IM, Clenciu D, Mitrea A, et al. Triglyceride and glucose index: a useful tool for non-alcoholic liver disease assessed by liver biopsy in patients with metabolic syndrome? Rom J Morphol Embryol. (2021) 62:475–80. doi: 10.47162/RJME.62.2.13

PubMed Abstract | Crossref Full Text | Google Scholar

21. Zhang Q, Xiao S, Jiao X, Shen Y. The triglyceride-glucose index is a predictor for cardiovascular and all-cause mortality in CVD patients with diabetes or pre-diabetes: evidence from NHANES 2001-2018. Cardiovasc Diabetol. (2023) 22:279. doi: 10.1186/s12933-023-02030-z

PubMed Abstract | Crossref Full Text | Google Scholar

22. Son DH, Lee HS, Lee YJ, Lee JH, Han JH. Comparison of triglyceride-glucose index and HOMA-IR for predicting prevalence and incidence of metabolic syndrome. Nutr Metab Cardiovasc Dis. (2022) 32:596–604. doi: 10.1016/j.numecd.2021.11.017

PubMed Abstract | Crossref Full Text | Google Scholar

23. Tang C, Zhang Q, Zhang C, Du X, Zhao Z, Qi W. Relationships among Helicobacter pylori seropositivity, the triglyceride-glucose index, and cardiovascular disease: a cohort study using the NHANES database. Cardiovasc Diabetol. (2024) 23:441. doi: 10.1186/s12933-024-02536-0

PubMed Abstract | Crossref Full Text | Google Scholar

24. Crabtree JE, Peichl P, Wyatt JI, Stachl U, Lindley IJ. Gastric interleukin-8 and IgA IL-8 autoantibodies in Helicobacter pylori infection. Scand J Immunol. (1993) 37:65–70. doi: 10.1111/j.1365-3083.1993.tb01666.x

PubMed Abstract | Crossref Full Text | Google Scholar

25. Kowalski M, Konturek PC, Pieniazek P, Karczewska E, Kluczka A, Grove R, et al. Prevalence of Helicobacter pylori infection in coronary artery disease and effect of its eradication on coronary lumen reduction after percutaneous coronary angioplasty. Dig Liver Dis. (2001) 33:222–9. doi: 10.1016/s1590-8658(01)80711-8

PubMed Abstract | Crossref Full Text | Google Scholar

26. Graham DY, Osato MS, Olson CA, Zhang J, Figura N. Effect of H. pylori infection and CagA status on leukocyte counts and liver function tests: extra-gastric manifestations of H. pylori infection. Helicobacter. (1998) 3:174–8. doi: 10.1046/j.1523-5378.1998.08018.x

PubMed Abstract | Crossref Full Text | Google Scholar

27. McMahon MV, Taylor CS, Ward ZJ, Alarid-Escudero F, Camargo MC, Laszkowska M, et al. Helicobacter pylori infection in the United States beyond NHANES: a scoping review of seroprevalence estimates by racial and ethnic groups. Lancet Reg Health Am. (2024) 41:100890. doi: 10.1016/j.lana.2024.100890

PubMed Abstract | Crossref Full Text | Google Scholar

28. Kohandel Gargari O, Fathi M, Rajai Firouzabadi S, Mohammadi I, Mahmoudi MH, Sarmadi M, et al. Assessing the diagnostic accuracy of machine learning algorithms for identification of asthma in United States adults based on NHANES dataset. Sci Rep. (2025) 15:4537. doi: 10.1038/s41598-025-88345-1

PubMed Abstract | Crossref Full Text | Google Scholar

29. Wang Q, Liang T, Li Y, Zhou P, Liu X. Machine learning for prediction of Helicobacter pylori infection based on basic health examination data in adults: a retrospective study. Front Med. (2025) 12:1587540. doi: 10.3389/fmed.2025.1587540

PubMed Abstract | Crossref Full Text | Google Scholar

30. Hu Y, Xu J, Huang L, Zheng Z, Zhao J, Chen T, et al. Artificial intelligence-assisted endoscopic diagnosis system for diagnosing Helicobacter pylori infection: a multicenter study. BMC Med. (2025) 23:540. doi: 10.1186/s12916-025-04379-2

PubMed Abstract | Crossref Full Text | Google Scholar

31. Yan-Dong L, Huo-Gen W, Xue-Hui Y, Xiao-Jin L, Shu-Wen Z, Yue-Wen L, et al. Real-Time Prediction of Helicobacter pylori infection using a deep learning model during esophagogastroduodenoscopy: a prospective multicenter study. Helicobacter. (2025) 30:e70078. doi: 10.1111/hel.70078

PubMed Abstract | Crossref Full Text | Google Scholar

32. Sugimoto M, Murata M. Influence of Helicobacter pylori infection and eradication therapy on nutrition and metabolic parameters. Gut Liver. (2025) 19:297–8. doi: 10.5009/gnl250159

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: Helicobacter pylori, triglyceride glucose, machine learning, inflammatory index, metabolic

Citation: Zhang Y, Duan R, Chen X and Wei L (2025) Associations between metabolic-inflammatory biomarkers and Helicobacter pylori infection: an interpretable machine learning prediction approach. Front. Nutr. 12:1674585. doi: 10.3389/fnut.2025.1674585

Received: 28 July 2025; Accepted: 31 October 2025;
Published: 19 November 2025.

Edited by:

George Grant, Independent Researcher, Aberdeen, United Kingdom

Reviewed by:

Daniel Matias Bustos Guajardo, Universidad Catolica del Maule, Chile
Yueming Hu, Fuyang Normal University, China
Anuradha Singh, University of Hyderabad, India

Copyright © 2025 Zhang, Duan, Chen and Wei. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Lijuan Wei, MTgzNTUyOTg3NkBxcS5jb20=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.