Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Public Health

Sec. Digital Public Health

Volume 13 - 2025 | doi: 10.3389/fpubh.2025.1605908

This article is part of the Research TopicAdvancing Healthcare AI: Evaluating Accuracy and Future DirectionsView all 6 articles

Battle of the artificial intelligence: a comprehensive comparative analysis of DeepSeek and ChatGPT for urinary incontinence-related questions

Provisionally accepted
Huawei  CaoHuawei Cao1Changzhen  HaoChangzhen Hao2Tao  ZhangTao Zhang3Xiang  ZhengXiang Zheng1Zihao  GaoZihao Gao1Jiyue  WuJiyue Wu1Lijian  GanLijian Gan1Yu  LiuYu Liu3*Xiangjun  ZengXiangjun Zeng4*Wei  WangWei Wang1*
  • 1Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
  • 2International Hospital, Peking University, Beijing, Beijing Municipality, China
  • 3Department of Pathology, Basic medical school, Capital Medical University, Beijing, China
  • 4Department of Physiology and Pathophysiology, Basic Medical School, Capital Medical University, Beijing, China

The final, formatted version of the article will be published soon.

Background: With the rapid advancement and widespread adoption of artificial intelligence (AI), patients increasingly turn to AI for initial medical guidance. Therefore, a comprehensive evaluation of AI-generated responses is warranted. This study aimed to compare the performance of DeepSeek and ChatGPT in answering urinary incontinence-related questions and to delineate their respective strengths and limitations.Methods: Based on the American Urological Association/Society of Urodynamics, Female Pelvic Medicine & Urogenital Reconstruction (AUA/SUFU) and European Association of Urology (EAU) guidelines, we designed 25 urinary incontinence-related questions. Responses from DeepSeek and ChatGPT-4.0 were evaluated for reliability, quality, and readability. Fleiss' kappa was employed to calculate inter-rater reliability. For clinical case scenarios, we additionally assessed the appropriateness of responses. A comprehensive comparative analysis was performed.The modified DISCERN (mDISCERN) scores for DeepSeek and ChatGPT-4.0 were 28.24 ± 0.88 and 28.76 ± 1.56, respectively, showing no practically meaningful difference [P = 0.188, Cohen's d = 0.41 (95% CI: -0.15, 0.97)]. Both AI chatbots rarely provided source references. In terms of quality, DeepSeek achieved a higher mean Global Quality Scale (GQS) score than ChatGPT-4.0 (4.76 ± 0.52 vs. 4.32 ± 0.69, P = 0.001). DeepSeek also demonstrated superior readability, as indicated by a higher Flesch Reading Ease (FRE) score (76.43 ± 10.90 vs. 70.95 ± 11.16, P = 0.039) and a lower Simple Measure of Gobbledygook (SMOG) index (12.26 ± 1.39 vs. 14.21 ± 1.88, P < 0.001), suggesting easier comprehension. Regarding guideline adherence, DeepSeek had 11 (73.33%) fully compliant responses, while ChatGPT-4.0 had 13 (86.67%), with no significant difference [P = 0.651, Cohen's w = 0.083 (95% CI: 0.021, 0.232)].DeepSeek and ChatGPT-4.0 might exhibit comparable reliability in answering urinary incontinence-related questions, though both lacked sufficient references. However, DeepSeek outperformed ChatGPT-4.0 in response quality and readability. While both AI chatbots largely adhered to clinical guidelines, occasional deviations were observed. Further refinements are necessary before the widespread clinical implementation of AI chatbots in urology.

Keywords: artificial intelligence, Urinary Incontinence, ChatGPT, deepseek, comparative analysis

Received: 07 Apr 2025; Accepted: 07 Jul 2025.

Copyright: © 2025 Cao, Hao, Zhang, Zheng, Gao, Wu, Gan, Liu, Zeng and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Yu Liu, Department of Pathology, Basic medical school, Capital Medical University, Beijing, 100069, China
Xiangjun Zeng, Department of Physiology and Pathophysiology, Basic Medical School, Capital Medical University, Beijing, 100069, China
Wei Wang, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.