AUTHOR=Chen Yan , Wang Miaojuan , Wang Jianfeng TITLE=Retrospective cohort study of Helicobacter pylori infection and risk stratification using 6-year UBT data JOURNAL=Frontiers in Public Health VOLUME=Volume 13 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2025.1563841 DOI=10.3389/fpubh.2025.1563841 ISSN=2296-2565 ABSTRACT=BackgroundHelicobacter pylori (H. pylori) infection is a major global health concern, linked to gastric cancer and metabolic disorders. Despite its widespread prevalence, accurate risk stratification remains challenging. This study aims to develop a machine learning (ML)-based risk prediction model using 6-year longitudinal Urea Breath Test (UBT) data to identify metabolic alterations associated with chronic H. pylori infection.MethodsA retrospective cohort study was conducted using health examination data from 3,409 individuals between 2016 and 2021. Participants were stratified into H. pylori-positive and negative groups based on longitudinal UBT results. Key metabolic markers, including HbA1c, LDL-C, BMI, and WBC, were analyzed. Three predictive models—logistic regression, random forest, and XGBoost—were compared to assess their predictive performance.ResultsAmong the cohort, 20.5% exhibited chronic H. pylori infection. Infected individuals had significantly higher HbA1c (+1.2%, p < 0.01), LDL-C (+15 mg/dL, p < 0.05), and WBC levels, alongside lower albumin (−0.8 g/dL, p < 0.01). The XGBoost model outperformed others (AUC = 0.6809, Accuracy = 81.13%) in predicting infection risk. A subgroup of 4.0% was identified as high-risk, highlighting the potential for early intervention.ConclusionThis study underscores the interplay between chronic H. pylori infection and metabolic dysfunction, offering new perspectives on risk prediction using machine learning. The XGBoost model demonstrated reliable performance in stratifying infection risk based on accessible clinical markers. Its integration into routine screening protocols could enhance early detection and personalized intervention strategies. Further studies should validate these findings across broader populations and incorporate additional risk factors.