ORIGINAL RESEARCH article
Front. Microbiol.
Sec. Ancient DNA and Forensic Microbiology
Volume 16 - 2025 | doi: 10.3389/fmicb.2025.1586195
Machine Learning Integrates Region-Specific Microbial Signatures to Distinguish Geographically Adjacent Populations within a Province
Provisionally accepted- 1Shanxi Medical University, Taiyuan, China
- 2South China University of Technology, Guangzhou, Guangdong Province, China
- 3University of Chinese Academy of Sciences, Beijing, Beijing, China
- 4South China Normal University, Guangzhou, Guangdong, China
- 5Beijing Genomics Institute (BGI), Shenzhen, China
- 6Hebei Medical University, Shijiazhuang, Hebei Province, China
- 7Huazhong University of Science and Technology, Wuhan, Hubei Province, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
The individual specificity and temporal stability of the human gut microbiota have revealed significant compositional differences across geographical provenances. However, the gut microbiota variations among people residing in different regions within a province remain enigmatic.Methods: Shotgun metagenomics sequencing was performed to analyze the gut microbiota of 381 unrelated Chinese Han individuals living in two cities (Wuhan and Shiyan) of Hubei Province. To obtain the optimal model that can distinguish geographically close populations, three machine learning (ML) algorithms based on microbiota or functions were employed.Results: Significant differences in microbial α diversity and β diversity were observed. Flavonifractor plautii and Bacteroides stercoris were region-specific markers that presented higher relative abundances in Wuhan individuals. By utilizing the genus-level index commonly used for 16s RNA as the base model, the prediction accuracy was greatly improved when species and functional data were added. Among the three ML algorithms, the random forest algorithm achieved the best performance, with an AUC of 0.943.The gut microbiota of individuals residing in the same province is significantly similar; however, pronounced differences in bacterial composition were noted between individuals. Integrating the gut microbiota and functions using machine learning algorithm can distinguish people from geographically close environments, offering a foundation for determining geographical origin through the gut microbiota. Moreover, a deeper understanding of host-specific associations may offer valuable forensic and clinical assistance.
Keywords: Intestinal bacteria, machine learning, Geographic Locations, Metagenomics, Forensic microbiology
Received: 02 Mar 2025; Accepted: 26 Jun 2025.
Copyright: © 2025 Luo, Chen, Zeng, Li, Chen, Zhang, Guo, Li, Ruan, Zhu, Gao, Zhang and LI. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Cairong Gao, Shanxi Medical University, Taiyuan, China
Cuntai Zhang, Huazhong University of Science and Technology, Wuhan, 430074, Hubei Province, China
TAO LI, Beijing Genomics Institute (BGI), Shenzhen, 518083, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.