Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Microbiol.

Sec. Systems Microbiology

Volume 16 - 2025 | doi: 10.3389/fmicb.2025.1578005

This article is part of the Research TopicArtificial Intelligence and mNGS in Pathogenic Microorganism Research.View all 4 articles

Constructing Inflammatory Bowel Disease Diagnostic Models Based on k-mer and Machine Learning

Provisionally accepted
Liwei  LiLiwei Li1Zheng  LiuZheng Liu1Jiamin  QinJiamin Qin1Guang  XiongGuang Xiong2Chongze  YangChongze Yang2Fuqing  CaiFuqing Cai1*Jiean  HuangJiean Huang1*
  • 1Second Affiliated Hospital of Guangxi Medical University, Nanning, China
  • 2First Affiliated Hospital, Guangxi Medical University, Nanning, Guangxi Zhuang Region, China

The final, formatted version of the article will be published soon.

Background: Inflammatory bowel disease (IBD), encompassing Crohn's disease (CD) and ulcerative colitis (UC), is linked to significant alterations in gut microbiota.Conventional diagnostic approaches frequently rely on invasive procedures, contributing to patient discomfort; hence, non-invasive diagnostic models present a valuable clinical alternative.Methods: Metagenomic and amplicon sequencing data were collected from fecal samples of patients with IBD and healthy individuals across diverse geographic regions. Diagnostic models were developed using Logistic Regression (LR), Support Vector Machine (SVM), Naïve Bayes (NB), and Feedforward Neural Network (FFNN), complemented by an ensemble model via a voting mechanism.Five-fold cross-validation facilitated the differentiation between normal controls (NC) and IBD, as well as between CD and UC.Results: K-mer-based methods leveraging metagenomic sequencing data demonstrated robust diagnostic performance, yielding ROC AUCs of 0.966 for IBD vs. NC and 0.955 for CD vs. UC. Similarly, models based on amplicon sequencing achieved ROC AUCs of 0.831 for IBD vs. NC and 0.903 for CD vs. UC. In comparison, k-mer-based approaches outperformed traditional microbiota-based models, which produced lower ROC AUCs of 0.868 for IBD vs. NC and 0.810 for CD vs. UC. Across all machine learning frameworks, the FFNN consistently attained the highest ROC AUC, underscoring its superior diagnostic performance.The integration of k-mer-based feature extraction with machine learning offers a non-invasive, highly accurate approach for IBD diagnosis, surpassing traditional microbiota-based models. This method holds considerable potential for clinical use, offering an effective alternative to invasive diagnostics and enhancing patient comfort.

Keywords: inflammatory bowel disease, Gut Microbiota, Non-invasive diagnosis, machine learning, k-mer

Received: 18 Feb 2025; Accepted: 26 May 2025.

Copyright: © 2025 Li, Liu, Qin, Xiong, Yang, Cai and Huang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Fuqing Cai, Second Affiliated Hospital of Guangxi Medical University, Nanning, China
Jiean Huang, Second Affiliated Hospital of Guangxi Medical University, Nanning, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.