Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Microbiol.

Sec. Ancient DNA and Forensic Microbiology

Volume 16 - 2025 | doi: 10.3389/fmicb.2025.1586195

Machine Learning Integrates Region-Specific Microbial Signatures to Distinguish Geographically Adjacent Populations within a Province

Provisionally accepted
Li  LuoLi Luo1Bangwei  ChenBangwei Chen2Shengyin  ZengShengyin Zeng3Yaxin  LiYaxin Li3Xiaolin  ChenXiaolin Chen4Jianguo  ZhangJianguo Zhang5Xiangjie  GuoXiangjie Guo1Shujin  LiShujin Li6Lei  RuanLei Ruan7Shida  ZhuShida Zhu5Cairong  GaoCairong Gao1*Cuntai  ZhangCuntai Zhang7*TAO  LITAO LI5*
  • 1Shanxi Medical University, Taiyuan, China
  • 2South China University of Technology, Guangzhou, Guangdong Province, China
  • 3University of Chinese Academy of Sciences, Beijing, Beijing, China
  • 4South China Normal University, Guangzhou, Guangdong, China
  • 5Beijing Genomics Institute (BGI), Shenzhen, China
  • 6Hebei Medical University, Shijiazhuang, Hebei Province, China
  • 7Huazhong University of Science and Technology, Wuhan, Hubei Province, China

The final, formatted version of the article will be published soon.

The individual specificity and temporal stability of the human gut microbiota have revealed significant compositional differences across geographical provenances. However, the gut microbiota variations among people residing in different regions within a province remain enigmatic.Methods: Shotgun metagenomics sequencing was performed to analyze the gut microbiota of 381 unrelated Chinese Han individuals living in two cities (Wuhan and Shiyan) of Hubei Province. To obtain the optimal model that can distinguish geographically close populations, three machine learning (ML) algorithms based on microbiota or functions were employed.Results: Significant differences in microbial α diversity and β diversity were observed. Flavonifractor plautii and Bacteroides stercoris were region-specific markers that presented higher relative abundances in Wuhan individuals. By utilizing the genus-level index commonly used for 16s RNA as the base model, the prediction accuracy was greatly improved when species and functional data were added. Among the three ML algorithms, the random forest algorithm achieved the best performance, with an AUC of 0.943.The gut microbiota of individuals residing in the same province is significantly similar; however, pronounced differences in bacterial composition were noted between individuals. Integrating the gut microbiota and functions using machine learning algorithm can distinguish people from geographically close environments, offering a foundation for determining geographical origin through the gut microbiota. Moreover, a deeper understanding of host-specific associations may offer valuable forensic and clinical assistance.

Keywords: Intestinal bacteria, machine learning, Geographic Locations, Metagenomics, Forensic microbiology

Received: 02 Mar 2025; Accepted: 26 Jun 2025.

Copyright: © 2025 Luo, Chen, Zeng, Li, Chen, Zhang, Guo, Li, Ruan, Zhu, Gao, Zhang and LI. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Cairong Gao, Shanxi Medical University, Taiyuan, China
Cuntai Zhang, Huazhong University of Science and Technology, Wuhan, 430074, Hubei Province, China
TAO LI, Beijing Genomics Institute (BGI), Shenzhen, 518083, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.