Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Digit. Health

Sec. Health Informatics

This article is part of the Research TopicGenerative AI and Large Language Models in Medicine: Applications, Challenges, and OpportunitiesView all 6 articles

Assessing ChatGPT versus evidence-based online responses for polycystic ovary syndrome (PCOS) self-management and education: an international cross-sectional blinded survey of healthcare professionals

Provisionally accepted
  • 1University of Wolverhampton, School of Health and Wellbeing, Faculty of Education, Health and Wellbeing, Wolverhampton, United Kingdom
  • 2School of Health and Wellbeing, Faculty of Education, Health and Wellbeing, University of Wolverhampton, Wolverhampton, United Kingdom
  • 3Warwickshire Institute for the Study of Diabetes, Endocrinology and Metabolism (WISDEM), University Hospitals Coventry and Warwickshire NHS Trust, Coventry, United Kingdom
  • 4Warwick Medical School, University of Warwick, Coventry, United Kingdom
  • 5Centre for Sport, Exercise and Life Sciences, Research Institute for Health & Wellbeing, Coventry University, Coventry, United Kingdom
  • 6Institute for Cardiometabolic Medicine, University Hospitals Coventry and Warwickshire NHS Trust, Coventry, United Kingdom
  • 7Division of Public Health, Sport and Wellbeing, Faculty of Health, Medicine and Society, University of Chester, Chester, United Kingdom
  • 8Aston Medical School, College of Health and Life Sciences, Aston University, Birmingham, United Kingdom
  • 9College of Health, Psychology and Social Care, University of Derby, Derby, United Kingdom

The final, formatted version of the article will be published soon.

Artificial intelligence (AI)-powered large language models, such as ChatGPT, are increasingly used by the public for health information. The reliability of such novel AI-tools in providing credible polycystic ovary syndrome (PCOS) information/advice requires investigation. Healthcare professionals involved in PCOS care (n=43 from 14 countries) used a 5-point Likert scale to evaluate ChatGPT-generated responses to frequently asked questions about PCOS against the corresponding patient-orientated, evidence-based recommendations/responses from AskPCOS. ChatGPT responses were rated significantly higher than the evidence-based responses for 11 of the 12 study questions, with moderate to large effect sizes (rrb = -0.46 to - 1.00; all p-values <0.05), with ChatGPT answers being rated on average 0.824 units higher. Scoring agreement varied (poor to fair), with seven questions showing statistically fair agreement (κ=0.24-0.37, p<0.05). Readability analyses found no statistically significant differences between ChatGPT and evidence-based responses. However, using ChatGPT for simplifying the responses resulted in significant improvement. ChatGPT holds potential as a complementary patient self-education tool in PCOS, capable of interactive engagement and simplifying medical language. Further research is needed to identity optimal integration of AI tools and validate their clinical applicability for PCOS self-education/management.

Keywords: Polycystic Ovary Syndrome, pcos, ChatGPT, artificial intelligence, AI, Large language models

Received: 05 Sep 2025; Accepted: 18 Nov 2025.

Copyright: © 2025 Graca, Dallaway, Alloh, Randeva, Kite and Kyrou. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Sandro Graca, s.graca@wlv.ac.uk

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.