ORIGINAL RESEARCH article
Front. Endocrinol.
Sec. Clinical Diabetes
Volume 16 - 2025 | doi: 10.3389/fendo.2025.1665935
This article is part of the Research TopicAI in Healthcare: Transforming Clinical Risk Prediction, Medical Large Language Models, and BeyondView all 6 articles
An Explainable Machine Learning Model for Predicting Preterm Birth in Pregnant Women with Gestational Diabetes Mellitus and Hypertensive Disorders of Pregnancy: Development and External Validation
Provisionally accepted- 1School of Medicine, University of Electronic Science and Technology of China, Chengdu 610072, Sichuan, China., ChengDu, China
- 2School of Medicine, Southwest Medical University, Luzhou 646000, Sichuan,, LouZhou, China
- 3School of Medicine, Southwest Medical University, Luzhou 646000, Sichuan,, LuZhou, China
- 4Department of Obstetrics and Gynaecology, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu 610072, Sichuan, China., ChengDu, China
- 5Department of Nursing, Sichuan Provincial People’s Hospital, University of Electronic Science and Technology of China, Chengdu 610072, Sichuan, China, ChengDu, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Background: Gestational diabetes mellitus (GDM) and hypertensive disorders of pregnancy (HDP) often coexist and share pathophysiological features such as insulin resistance and endothelial dysfunction, increasing the risk of preterm birth. However, few predictive models have focused specifically on this high-risk group. This study aimed to develop and externally validate a machine learning model for this high-risk population and assess its clinical utility and interpretability. Methods: This retrospective dual-center study included electronic medical records from 121 and 136 pregnant women with comorbid GDM and HDP, and collected at two clinical centers, which served as the development and external validation cohorts, respectively. Multiple machine learning algorithms, including the least absolute shrinkage and selection operator (LASSO) regression, random forest, and Naive Bayes (NB), were applied to construct predictive models. To address class imbalance and enhance model robustness, the Synthetic Minority Over-sampling Technique (SMOTE, which generates synthetic samples for the minority class to balance imbalanced datasets) was employed. Model interpretability was further assessed using Shapley Additive Explanations (SHAP). Results: Thirteen variables with univariate significance were entered into Elastic Net regression, yielding five key predictors: alanine transaminase (ALT), aspartate transaminase (AST), albumin, lactate dehydrogenase (LDH), and systolic blood pressure at 32–36 weeks (SBP_32_36). While the LASSO model achieved the highest AUC (0.802), the Naive Bayes model demonstrated greater clinical net benefit, higher reclassification performance as measured by the Net Reclassification Improvement (NRI, which evaluates whether patients are more accurately assigned to higher-or lower-risk groups, which reflects the average improvement in distinguishing high-risk from low-risk patients) and Integrated Discrimination Improvement (IDI), and greater robustness in SMOTE-based sensitivity analyses. In the external validation cohort (n = 136, external validation cohort), it maintained strong generalization with an AUC of 0.777 (95% CI: 0.645–0.887), accuracy of 0.801 (95% CI: 0.735–0.860), sensitivity of 0.792, and specificity of 0.804, supporting its selection as the optimal model for this high-risk population. Conclusions: The Naive Bayes model exhibited robust predictive ability and interpretability for identifying preterm birth risk in pregnancies with comorbid GDM and HDP, and may serve as a transparent, clinically applicable tool for individualized obstetric risk management.
Keywords: Preterm Birth, gestational diabetes mellitus, Hypertensive disorders of pregnancy, SHapley AdditiveexPlanations, Elastic net regression, Risk prediction model
Received: 14 Jul 2025; Accepted: 01 Sep 2025.
Copyright: © 2025 Kang, Luo, WENCHI, Luo, Mei and He. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Jie Mei, Department of Obstetrics and Gynaecology, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu 610072, Sichuan, China., ChengDu, China
Jing He, Department of Nursing, Sichuan Provincial People’s Hospital, University of Electronic Science and Technology of China, Chengdu 610072, Sichuan, China, ChengDu, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.