AUTHOR=Luo Li , Chen Bangwei , Zeng Shengyin , Li Yaxin , Chen Xiaolin , Zhang Jianguo , Guo Xiangjie , Li Shujin , Ruan Lei , Zhu Shida , Gao Cairong , Zhang Cuntai , Li Tao TITLE=Machine learning integrates region-specific microbial signatures to distinguish geographically adjacent populations within a province JOURNAL=Frontiers in Microbiology VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2025.1586195 DOI=10.3389/fmicb.2025.1586195 ISSN=1664-302X ABSTRACT=BackgroundThe individual specificity and temporal stability of the human gut microbiota have revealed significant compositional differences across geographical provenances. However, the gut microbiota variations among people residing in different regions within a province remain enigmatic.MethodsShotgun metagenomics sequencing was performed to analyze the gut microbiota of 381 unrelated Chinese Han individuals living in two cities (Wuhan and Shiyan) of Hubei Province. To obtain the optimal model that can distinguish geographically close populations, three machine learning (ML) algorithms based on microbiota or functions were employed.ResultsSignificant differences in microbial α diversity and β diversity were observed. Flavonifractor plautii and Bacteroides stercoris were region-specific markers that presented higher relative abundances in Wuhan individuals. By utilizing the genus-level index commonly used for 16 s RNA as the base model, the prediction accuracy was greatly improved when species and functional data were added. Among the three ML algorithms, the random forest algorithm achieved the best performance, with an AUC of 0.943.ConclusionThe gut microbiota of individuals residing in the same province is significantly similar; however, pronounced differences in bacterial composition were noted between individuals. Integrating the gut microbiota and functions using machine learning algorithm can distinguish people from geographically close environments, offering a foundation for determining geographical origin through the gut microbiota. Moreover, a deeper understanding of host-specific associations may offer valuable forensic and clinical assistance.