Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Endocrinol.

Sec. Reproduction

Development and Validation of an Explainable Machine Learning and Nomogram Model for Early Detection and Risk Stratification of Polycystic Ovary Syndrome: A Multicenter Study

Provisionally accepted
Bihua  YaoBihua Yao1Xingyu  YuXingyu Yu2Yunyan  ZhangYunyan Zhang1Jiayan  ChenJiayan Chen3Xiaotong  ZhuXiaotong Zhu3Tong  JijunTong Jijun2*Cheng  ZhangCheng Zhang4*
  • 1The First People’s Hospital of Jiashan affiliated to Jiaxing University, Jiaxing, China
  • 2Zhejiang Sci-Tech University, Hangzhou, China
  • 3Wenzhou Medical University, Wenzhou, China
  • 4Jiaxing Hospital of Traditional Chinese Medicine, Jiaxing, China

The final, formatted version of the article will be published soon.

Abstract Background: Polycystic ovary syndrome (PCOS) is a common endocrine – metabolic condition in reproductive-aged women, linked to infertility and long-term cardiometabolic risk. Early identification remains challenging because current diagnosis relies on hormone testing and imaging. This research sought to develop and evaluate an interpretable machine learning (ML) model and a simplified nomogram for the early detection of PCOS. Methods: Data from 1,600 women at the First People's Hospital of Jiashan were used for model training, with 283 external cases from Jiaxing Hospital of Traditional Chinese Medicine for validation. Twenty-three routine laboratory indicators were analyzed. After LASSO feature selection, seven ML algorithms were compared. The best-performing XGBoost model was interpreted using Shapley Additive exPlanations (SHAP). A logistic regression–based nomogram was developed from the key predictors. Results: The XGBoost model showed excellent discrimination (AUC = 0.919 internal; 0.923 external). SHAP identified DHEAS, AMH, TG, and age as key contributors. The nomogram also performed well (AUC = 0.901 train; 0.887 test). Conclusions: This interpretable "XGBoost + SHAP" and nomogram framework provides an accurate, transparent, and practical tool for early PCOS screening and individualized management.

Keywords: Early Screening, machine learning, nomogram, Polycystic Ovary Syndrome, Shap, XGBoost

Received: 06 Oct 2025; Accepted: 01 Dec 2025.

Copyright: © 2025 Yao, Yu, Zhang, Chen, Zhu, Jijun and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Tong Jijun
Cheng Zhang

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.