ORIGINAL RESEARCH article
Front. Microbiol.
Sec. Microorganisms in Vertebrate Digestive Systems
This article is part of the Research TopicNew and advanced mechanistic insights into the influences of the infant gut microbiota on human health and disease, Volume IIView all 11 articles
Neonatal Gut Microbiota Stratification and Identification of SCFA-Associated Microbial Subgroups Using Unsupervised Clustering and Machine Learning Classification
Provisionally accepted- 1Kangwon National University School of Medicine, Chuncheon-si, Republic of Korea
- 2Kangwon National University Hospital, Chuncheon-si, Republic of Korea
- 3Seoul National University, Gwanak-gu, Republic of Korea
- 4CHA University School of Medicine, Pocheon-si, Republic of Korea
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Background: The neonatal gut microbiome plays a critical role in infant health through the production of short-chain fatty acids (SCFAs). However, the organization of SCFAs-producing microbial communities in neonates remains poorly characterized. This study applied unsupervised clustering and machine learning to classify microbial subgroups associated with SCFAs production, providing insight into their composition and metabolic potential. Methods: This study recruited 71 mother-infant pairs from Kangwon National University Hospital and Bundang CHA Hospital, collecting meconium samples within five days postpartum. Microbial diversity was analyzed by 16S rRNA gene sequencing (V3–V4 region) at the genus level, and SCFAs were quantified from the same samples. To identify functionally distinct microbial subgroups, K-Means, Agglomerative, Spectral, and Gaussian Mixture Model clustering were applied. Clustering validity was assessed using Silhouette Score, Calinski-Harabasz Index, Davies-Bouldin Index, and Prediction Strength Validation, with t-distributed Stochastic Neighbor Embedding (t-SNE) visualization to evaluate cluster separation. SCFAs distributions across clusters were compared, while random forest and logistic regression models were used to classify SCFAs-associated microbial clusters through Receiver Operating Characteristic curves (ROC). Results: The clustering analysis identified distinct microbial subgroups linked to SCFAs production, with Agglomerative clustering outperforming K-Means in capturing functionally relevant structures. Cluster 1 had higher SCFAs levels, enriched in Bacteroides, Prevotella, and Enterococcus, while Cluster 2 exhibited lower SCFAs concentrations with a more heterogeneous composition. The introduction of a third cluster in multi-class analysis revealed an intermediate metabolic profile, suggesting a continuum in microbial metabolic function. Classification analysis confirmed random forest model superiority, achieving ROC score of 91.05% (Agglomerative) and 87.74% (K-Means) in binary classification, and 92.98% (Agglomerative) and 89.84% (K-Means) in multi-class classification, demonstrating RF's strong predictive ability for SCFAs-based clusters. Conclusion: Unsupervised clustering combined with classification analysis effectively predict SCFAs-associated subgroups and paving the way for future research on longitudinal tracking and functional genomic integration in early-life metabolic health.
Keywords: neonatal microbiota, Microbial clustering, short-chain fatty acids, Machine learning classification, unsupervided learning
Received: 21 Jul 2025; Accepted: 14 Nov 2025.
Copyright: © 2025 Cho, YUN, Hosseinzadeh Kasani and Jeong. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Payam Hosseinzadeh Kasani, payam.kassani@kangwon.ac.kr
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
