ORIGINAL RESEARCH article
Front. Endocrinol.
Sec. Reproduction
Development and Validation of an Explainable Machine Learning and Nomogram Model for Early Detection and Risk Stratification of Polycystic Ovary Syndrome: A Multicenter Study
Provisionally accepted- 1The First People’s Hospital of Jiashan affiliated to Jiaxing University, Jiaxing, China
- 2Zhejiang Sci-Tech University, Hangzhou, China
- 3Wenzhou Medical University, Wenzhou, China
- 4Jiaxing Hospital of Traditional Chinese Medicine, Jiaxing, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Abstract Background: Polycystic ovary syndrome (PCOS) is a common endocrine – metabolic condition in reproductive-aged women, linked to infertility and long-term cardiometabolic risk. Early identification remains challenging because current diagnosis relies on hormone testing and imaging. This research sought to develop and evaluate an interpretable machine learning (ML) model and a simplified nomogram for the early detection of PCOS. Methods: Data from 1,600 women at the First People's Hospital of Jiashan were used for model training, with 283 external cases from Jiaxing Hospital of Traditional Chinese Medicine for validation. Twenty-three routine laboratory indicators were analyzed. After LASSO feature selection, seven ML algorithms were compared. The best-performing XGBoost model was interpreted using Shapley Additive exPlanations (SHAP). A logistic regression–based nomogram was developed from the key predictors. Results: The XGBoost model showed excellent discrimination (AUC = 0.919 internal; 0.923 external). SHAP identified DHEAS, AMH, TG, and age as key contributors. The nomogram also performed well (AUC = 0.901 train; 0.887 test). Conclusions: This interpretable "XGBoost + SHAP" and nomogram framework provides an accurate, transparent, and practical tool for early PCOS screening and individualized management.
Keywords: Early Screening, machine learning, nomogram, Polycystic Ovary Syndrome, Shap, XGBoost
Received: 06 Oct 2025; Accepted: 01 Dec 2025.
Copyright: © 2025 Yao, Yu, Zhang, Chen, Zhu, Jijun and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Tong Jijun
Cheng Zhang
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
