ORIGINAL RESEARCH article
Front. Endocrinol.
Sec. Clinical Diabetes
Volume 16 - 2025 | doi: 10.3389/fendo.2025.1687146
This article is part of the Research TopicHarnessing Machine Learning for Enhanced Biomedical Diagnosis and Early Disease Detection: Bridging Data Science and HealthcareView all articles
Early Prediction of Gestational Diabetes Mellitus Using Machine Learning-Integrated Metabolomic and Clinical features
Provisionally accepted- 1Department of Endocrinology, Hainan General Hospital, Haikou, China
- 2Department of Endocrinology and Health Management Center, Hainan General Hospital, Haikou, China
- 3Department of Medical Record Management, Hainan General Hospital, Haikou, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Abstract Background Gestational diabetes mellitus (GDM), a prevalent metabolic disorder associated with pregnancy, which often postpones intervention until after metabolic complications have developed. This study seeks to develop an integrated predictive model that combines first trimester metabolomic signatures with established clinical risk factors to enable the early detection of high-risk pregnancies prior to the onset of irreversible metabolic damages. Methods A total of 89 pregnant women [45 with GDM, 44 with normal glucose tolerance (NGT)] was recruited at Hainan Provincial People's Hospital. Serum and urine samples were subjected to untargeted metabolomic profiling employing UPLC-MS/MS. Metabolite identification was conducted using the Human Metabolome Database and Metlin databases. Bioinformatics analyses were performed on the differential metabolites. Lasso regression was employed to select the metabolites and clinical features utilized in constructing the model. The entire dataset was divided into a training set and a validation set in a 7:3 ratio. Six Machine learning models were trained to identify patients with GDM. Model performance was assessed using area under the receiver operating characteristic curve (AUC), precision, recall, and F1-score. Shapley Additive exPlanations (SHAP) analysis was used to interpret feature contributions in the optimal model. Results Cases of GDM demonstrated distinct metabolic profiles in comparison to participants with NGT. A total of 528 differential metabolites were identified, and KEGG pathway analysis mapped these metabolites to 20 pathways related to metabolism and human diseases. Lasso regression identified 11 differential metabolites and 3 clinical features for training the ML models. Ultimately, the multilayer perceptron achieved the highest classification performance, with an AUC of 0.984 (95%CI: 0.866-1.000) in the validation set. SHAP analysis identified GlcCer(d18:1/16:0) and triglycerides as the most significant predictors, demonstrating positive associations with the risk of GDM. Conclusion Participants with GDM and NGT show great difference in the levels of many metabolites. The ML model according to the metabolites in the first trimester and clinical feature demonstrates high
Keywords: gestational diabetes mellitus, Metabolites, machine learning, Early prediction, Metabolomic profiling
Received: 16 Aug 2025; Accepted: 13 Oct 2025.
Copyright: © 2025 Ji, Gao, Liu, Chen, Fu, Lin and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Fei Wang, libby.626@hainmc.edu.cn
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.