Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Microbiol.

Sec. Microorganisms in Vertebrate Digestive Systems

Machine learning models diagnoses oral squamous cell carcinoma based on cross-cohort oral microbial signatures

Provisionally accepted
Mingchao  WangMingchao Wang1Yanfei  SunYanfei Sun2Wen  GongWen Gong1*
  • 1School of Stomatology, Qingdao University, Qingdao, China
  • 2Qingdao Municipal Hospital Group East Hospital, Qingdao, China

The final, formatted version of the article will be published soon.

The saliva microbiome of oral squamous cell carcinoma (OSCC) patients has been gradually unveiled, but there is a lack of cross-cohort studies, and there is no non-invasive diagnostic model across cohorts for the OSCC. The study, aiming to investigate the differences in saliva microbial composition between OSCC patients and healthy individuals using cross-cohort saliva microbiome, including 354 healthy individuals and 311 OSCC patients (total n=665). The study found significant differences in saliva microbial composition between OSCC patients and healthy people. Seven microorganisms were significantly reduced and seven were significantly increased in OSCC patients, which could serve as potential biomarkers. Machine learning models, including random forest, extra trees, gradient boosting and XGBoost, were used to build a clinical diagnostic model of OSCC using saliva microorganisms, achieving area under the curve (AUC) values ranging from 63.1% to 96.9% at both genus and species levels in a rigorous leave-one-cohort-out cross-validation. Our study provides a robust non-invasive diagnostic model for OSCC and demonstrates that high diagnostic accuracy is achievable at both genus and species levels, suggesting that taxonomic resolution is not the primary limiting factor. Instead, the choice of different model construction methods is crucial. Therefore, greater attention should be paid to the selection of model methods in clinical applications.

Keywords: Cross-cohort validation, machine learning, Noninvasive diagnosis, oral microbiome, oral squamous cell carcinoma

Received: 25 Aug 2025; Accepted: 02 Dec 2025.

Copyright: © 2025 Wang, Sun and Gong. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Wen Gong

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.