AUTHOR=Li Liwei , Liu Zheng , Qin Jiamin , Xiong Guang , Yang Chongze , Cai Fuqing , Huang Jiean TITLE=Constructing inflammatory bowel disease diagnostic models based on k-mer and machine learning JOURNAL=Frontiers in Microbiology VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2025.1578005 DOI=10.3389/fmicb.2025.1578005 ISSN=1664-302X ABSTRACT=BackgroundInflammatory bowel disease (IBD), encompassing Crohn’s disease (CD) and ulcerative colitis (UC), is linked to significant alterations in gut microbiota. Conventional diagnostic approaches frequently rely on invasive procedures, contributing to patient discomfort; hence, non-invasive diagnostic models present a valuable clinical alternative.MethodsMetagenomic and amplicon sequencing data were collected from fecal samples of patients with IBD and healthy individuals across diverse geographic regions. Diagnostic models were developed using Logistic Regression (LR), Support Vector Machine (SVM), Naïve Bayes (NB), and Feedforward Neural Network (FFNN), complemented by an ensemble model via a voting mechanism. Five-fold cross-validation facilitated the differentiation between normal controls (NC) and IBD, as well as between CD and UC.ResultsK-mer-based methods leveraging metagenomic sequencing data demonstrated robust diagnostic performance, yielding ROC AUCs of 0.966 for IBD vs. NC and 0.955 for CD vs. UC. Similarly, models based on amplicon sequencing achieved ROC AUCs of 0.831 for IBD vs. NC and 0.903 for CD vs. UC. In comparison, k-mer-based approaches outperformed traditional microbiota-based models, which produced lower ROC AUCs of 0.868 for IBD vs. NC and 0.810 for CD vs. UC. Across all machine learning frameworks, the FFNN consistently attained the highest ROC AUC, underscoring its superior diagnostic performance.ConclusionThe integration of k-mer-based feature extraction with machine learning offers a non-invasive, highly accurate approach for IBD diagnosis, surpassing traditional microbiota-based models. This method holds considerable potential for clinical use, offering an effective alternative to invasive diagnostics and enhancing patient comfort.