Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Nutr.

Sec. Nutritional Epidemiology

Associations Between Metabolic-Inflammatory Biomarkers and Helicobacter pylori Infection: An Interpretable Machine Learning Prediction Approach

Provisionally accepted
Yue  ZhangYue Zhang1Ruifeng  DuanRuifeng Duan1Xin  ChenXin Chen1Wei  LijuanWei Lijuan1,2*
  • 1The Second Hospital of Jilin University, Changchun, China
  • 2Jilin University, Changchun, China

The final, formatted version of the article will be published soon.

Background: This study investigated the association between metabolic-inflammatory markers and Helicobacter pylori (HP) infection using interpretable machine learning models, with a focus on the triglyceride-glucose (TyG) index, TyG/HDL-C ratio, and systemic inflammatory biomarkers. Methods: Data from 2,924 NHANES participants and 1021 patients from the Second Hospital of Jilin University were analyzed. Associations between metabolic-inflammatory markers and HP were assessed using multivariable regression. Eleven machine learning models were compared for predictive performance, evaluated by AUC, accuracy, sensitivity, specificity, precision, F1 score, and Kappa statistic. Interpretability was assessed via SHAP values, calibration plots, confusion matrices, and decision curve analysis. Results: In NHANES, the TyG index was independently associated with HP infection (OR = 1.25, 95% CI 1.06–1.48, P = 0.009), and the TyG/HDL-C ratio remained significant after full adjustment (OR = 1.16, 95% CI 1.07–1.25, P < 0.001), while SIRI, IBI, and CRP lost significance. In the external Chinese cohort, the TyG association attenuated (P = 0.057), but higher TyG/HDL-C quartiles remained significant. Among algorithms, Random Forest (RF) and Gaussian Process (GP) achieved the highest AUCs on the training set (both 0.97) but dropped markedly on the validation set (both 0.75), indicating overfitting. In contrast, XGBoost (XGB)and MLP maintained more consistent AUCs between training (0.77) and validation (0.77), reflecting better generalization. DeLong ' s test indicated that both RF and XGB significantly outperformed baseline models (P < 0.001), while XGB demonstrated more stable validation performance. Decision curve and SHAP analyses supported the clinical relevance of XGB, highlighting Race and Age as dominant contributors. Conclusion:The TyG index and TyG/HDL-C ratio were independently associated with HP infection. Among machine learning models, XGBoost demonstrated the most stable and generalizable performance (AUC 0.77 in both training and validation), whereas RF and GP (AUC 0.97 →0.75) exhibited overfitting. These results suggest that XGB provides a more reliable framework for infection risk prediction, though the cross-sectional design precludes causal inference.

Keywords: Helicobacter pylori, triglyceride glucose, machine learning, inflammatory index, metabolic

Received: 28 Jul 2025; Accepted: 31 Oct 2025.

Copyright: © 2025 Zhang, Duan, Chen and Lijuan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Wei Lijuan, 1835529876@qq.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.