Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Public Health

Sec. Infectious Diseases: Epidemiology and Prevention

Volume 13 - 2025 | doi: 10.3389/fpubh.2025.1688708

Nine-Year Risk Stratification and Prediction of Helicobacter pylori Infection Using Group-Based Trajectory Modeling and Machine Learning in 35,206 Adults

Provisionally accepted
  • 1First Hospital of Shanxi Medical University, Taiyuan, China
  • 2Shanxi Medical University, Taiyuan, China

The final, formatted version of the article will be published soon.

Background. Helicobacter pylori (H. pylori) infection remains prevalent in regions such as Shanxi, China, contributing to gastrointestinal morbidity. Accurately identifying high-risk individuals is essential for effective screening and early intervention. Methods. We conducted a retrospective longitudinal cohort study of 35,206 adults who underwent repeated annual health checkups with H. pylori testing at a single center from 2016 to 2024. Group-based trajectory modeling (GBTM) identified risk subgroups. Multivariable logistic regression identified predictors of high-risk trajectories; alcohol consumption was assessed as an effect modifier. Five machine learning models—including Light Gradient Boosting Machine (LightGBM), Extreme Gradient Boosting, Logistic regression, etc.—were trained using a 7:3 split. Temporal validation (2016 – 2020 training/2021 – 2024 validation) assessed generalizability. SHapley Additive exPlanations (SHAP) improved interpretability. A prediction tool was deployed via R Shiny. Results. GBTM identified high-risk (14.63%) and low-risk (85.37%) groups. Protective factors included women (OR=0.042, 95% CI: 0.039–0.046) and unmarried status (OR=0.092, 95% CI: 0.085–0.099); risk factors included obesity (OR=1.138, 95% CI: 1.070–1.210), blue-collar workers (OR=1.557, 95% CI: 1.454–1.666), and alcohol consumption (OR=1.277, 95% CI: 1.165–1.401). Alcohol consumption interacted with all significant factors in subgroup analysis (all P <0.001), with the strongest interaction observed for being married (OR = 8.622, 95% CI: 7.872–9.437). Internal (2016–2020) and external (2021–2024) validation assessed generalizability. LightGBM achieved AUCs of 0.851 (training), 0.843 (validation), 0.863 (temporal training), and 0.831 (temporal validation). SHAP ranked marital status and sex as top predictors. The tool is available at: https://prediction-model-for-hp.shinyapps.io/hp_shinyapp-/. Conclusions. We developed an online, interpretable risk prediction tool with validated accuracy to support precision screening of H. pylori infection.

Keywords: Helicobacter pylori, machine learning, risk prediction, group-basedtrajectory modeling, Shapley additive explanations

Received: 30 Aug 2025; Accepted: 22 Oct 2025.

Copyright: © 2025 Zhao, Liu, Wei, Wang, 肖 and Yao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Heping Zhao, zhaoheping360@126.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.