Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Microbiol.

Sec. Infectious Agents and Disease

Volume 16 - 2025 | doi: 10.3389/fmicb.2025.1687021

This article is part of the Research TopicRapid and Efficient Analytical Technologies for Pathogen DetectionView all 17 articles

AI-Powered Three-Category H. pylori Diagnosis via Magnetic Controlled Capsule Endoscopy: A Multicenter Validation of a Vision-Language Model

Provisionally accepted
Shiping  XuShiping Xu1,2*Xi  SunXi Sun2Jing  LiuJing Liu2Lili  WuLili Wu2Xiao  ChenXiao Chen2Xiaona  MaXiaona Ma2Fei  TengFei Teng2Ting  ZhangTing Zhang3Hui  SuHui Su4Xin  FanXin Fan4Jiaxin  LiJiaxin Li5Peng  JinPeng Jin4*Hongmei  JiaoHongmei Jiao5*
  • 1Department of Neurology, First Medical Center, Chinese PLA General Hospital, Beijing, China
  • 2Chinese PLA General Hospital Second Medical Center Department of Gastroenterology, Beijing, China
  • 3Chinese PLA General Hospital, Beijing, China
  • 4The Seventh Medical Center of PLA General Hospital, Beijing, China
  • 5Peking University First Hospital, Beijing, China

The final, formatted version of the article will be published soon.

Accurate classification of Helicobacter pylori (H. pylori) infection status is critical for gastric cancer risk stratification. Current methods based on traditional convolutional neural networks (CNNs) are limited by their reliance on fragmented single-image analysis and operator-dependent selection variability, impairing diagnostic reliability. To overcome these limitations, we developed MC-CLIP, a vision-language foundation model for fully automated three-categorical H. pylori diagnosis using magnetic-controlled capsule endoscopy (MCCE). The model was pretrained on 2,427,475 MCCE image-text pairs from 123,543 examinations and fine-tuned on 40,695 expertly annotated images from 864 patients. Validated on multicenter internal (n=220) and external (n=208) cohorts, MC-CLIP autonomously selected 30 representative images per case for end-to-end classification, achieving overall accuracies of 89.6% (95% CI: 85.5-93.6%) and 86.6% (80.8-90.3%), respectively. MC-CLIP demonstrated particularly strong performance in detecting H. pylori infection, with sensitivities of 91.4% for current infection and 83.7% for past infection—significantly surpassing both senior endoscopists (84.3% and 71.4%, respectively) and junior endoscopists (74.3% current infection). The model maintained high specificity (>90% across all categories) and excelled at identifying subtle post-eradication mucosal changes, reducing misclassification of past infections as non-infections. By integrating multimodal image-text data through end-to-end analysis, MC-CLIP effectively addresses fundamental limitations of CNN-based approaches and shows strong potential for enhancing MCCE-based gastric cancer screening. Accurate classification of Helicobacter pylori (H. pylori) infection status (current infection, past infection, and non-infection) is crucial for gastric cancer screening, and while traditional convolutional neural networks (CNNs) face limitations in three-categorical diagnosis due to fragmented single-image analysis and operator-dependent variability, this study developed and validated a contrastive language-image pretraining (CLIP) model (MC-CLIP) for automated three-categorical H. pylori diagnosis using magnetic-controlled capsule endoscopy (MCCE), where MC-CLIP was pretrained on 2,427,475 MCCE image-text pairs from 123,543 examinations to establish vision-language alignment and fine-tuned using 40,695 annotated images from 864 patients before being validated on internal (n=220) and external (n=208) cohorts, autonomously selecting 30 representative images per patient for end-to-end classification while demonstrating superior performance with overall accuracies of 89.6% (95% CI: 85.5-93.6%) and 86.6% (80.8-90.3%) in internal and external validation respectively, significantly outperforming

Keywords: Helico bacter pylori, Artificial inteleigence, gastric cancer, large language model (LLM), Capsule endoscope

Received: 16 Aug 2025; Accepted: 25 Sep 2025.

Copyright: © 2025 Xu, Sun, Liu, Wu, Chen, Ma, Teng, Zhang, Su, Fan, Li, Jin and Jiao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Shiping Xu, xushiping@301hospital.com.cn
Peng Jin, jinpeng@301hospital.com.cn
Hongmei Jiao, jiaohm@139.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.