Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Med.

Sec. Family Medicine and Primary Care

Volume 12 - 2025 | doi: 10.3389/fmed.2025.1658561

This article is part of the Research TopicAI in Healthcare: Transforming Clinical Risk Prediction, Medical Large Language Models, and BeyondView all 10 articles

Assessing the ability of ChatGPT 4.0 in generating check-up reports

Provisionally accepted
Yikai  ChenYikai Chen1Yuxin  LiuYuxin Liu2Yuanchang  HuangYuanchang Huang1Xiujie  HuangXiujie Huang1Zhuoqun  ZhengZhuoqun Zheng1Fangjie  YangFangjie Yang1Haiming  LinHaiming Lin3,4Haoyu  LinHaoyu Lin5Xinxin  LiXinxin Li1Aosi  XieAosi Xie1*Huang  YitengHuang Yiteng2*
  • 1Department of Gastroenterological Surgery, The First Affiliated Hospital of Shantou University Medical College, Shantou, China
  • 2Health Management Center, First Affiliated Hospital of Shantou University Medical College, Shantou, China
  • 3Department of Orthopaedics, The First Affiliated Hospital of Shantou University Medical College, Shantou, China
  • 4University of Alberta Faculty of Medicine & Dentistry, Edmonton, Canada
  • 5Department of Thyroid and Breast Surgery, The First Affiliated Hospital of Shantou University Medical College, Shantou, China

The final, formatted version of the article will be published soon.

Background: ChatGPT (Chat Generative Pre-trained Transformer), a generative language model, has been applied across various clinical domains. Health check-ups, a widely adopted method for comprehensively assessing personal health, are now chosen by an increasing number of individuals. This study aimed to evaluate ChatGPT 4.0's ability to efficiently provide patients with accurate and personalized health reports. Methods: A total of 89 check-up reports generated by ChatGPT 4.0 were assessed. The reports were derived from the Check-up Center of the First Affiliated Hospital of Shantou University Medical College. Each report was translated into English by ChatGPT 4.0 and graded independently by three qualified doctors in both English and Chinese. The grading criteria encompassed six aspects: adherence to current treatment guidelines (Guide), diagnostic accuracy (Diagnosis), logical flow of information (Order), systematic presentation (System), internal consistency (Consistency), and appropriateness of recommendations (Suggestion), each scored on a 4-point scale. The complexity of the cases was categorized into three levels (LOW, MEDIUM, HIGH). Wilcoxon rank sum test and Kruskal-Wallis test were selected to examine differences in grading across languages and complexity levels. Results: ChatGPT 4.0 demonstrated strong performance in adhering to clinical guidelines, providing accurate diagnoses, systematic presentation, and maintaining consistency. However, it struggled with prioritizing high-risk items and providing comprehensive suggestions. In the "Order"category, a significant proportion of reports contained mixed data, several reports being completely incorrect. In the "Suggestion" category, most reports were deemed correct but inadequate. No significant language advantage was observed, with performance varying across complexity levels. English reports showed significant differences in grading across complexity levels, while Chinese reports exhibited distinct performance across all categories. Conclusion: In conclusion, ChatGPT 4.0 is currently well-suited as an assistant to the chief examiner, particularly for handling simpler tasks and contributing to specific sections of check-up reports. It holds the potential to enhance medical efficiency, improve the quality of clinical check-up work, and deliver patient-centered services.

Keywords: Health care service, artificial intelligence - AI, check-up, ChatGPT 4.0, health report

Received: 02 Jul 2025; Accepted: 22 Sep 2025.

Copyright: © 2025 Chen, Liu, Huang, Huang, Zheng, Yang, Lin, Lin, Li, Xie and Yiteng. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Aosi Xie, xieaosi@163.com
Huang Yiteng, g_ythuang@stu.edu.cn

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.