BRIEF RESEARCH REPORT article
Front. Med.
Sec. Family Medicine and Primary Care
Volume 12 - 2025 | doi: 10.3389/fmed.2025.1632303
This article is part of the Research TopicThe Applications of AI Techniques in Medical Data ProcessingView all 14 articles
Performance of GPT-4 for planning acupuncture treatment: Comparison with human clinician performance
Provisionally accepted- 1Kyung Hee University, Seoul, Republic of Korea
- 2Korea Institute of Oriental Medicine (KIOM), Daejeon, Republic of Korea
- 3Jaseng Spine and Joint Research Institute, Jaseng Medical Foundation, Seoul, Republic of Korea
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Background: The medical knowledge of GPT-4 has been evaluated using real patients, providing diagnostic and treatment suggestions. However, few studies have directly compared the clinical suggestions of GPT-4 with those of groups of practitioners. Methods: This study assessed the ability of GPT-4 to make medical decisions regarding acupuncture treatment by comparing its selection of acupoints with those made by human clinicians. Ten case reports published in Korean medical journals were selected and put in a standardized format. The standardized patient information was given to 80 Korean Medicine doctors and GPT-4 to diagnose and prescribe three to five acupoints per case. To evaluate the performance of GPT-4, the similarities in acupoint selection between the doctors and GPT-4 were quantified based on the percentage overlap and correlations of the selection probabilities of acupoints in each case. Results: The average percentage overlap for acupoints among cases at the 10% cutoff was 51.3%, i.e., more than half of the GPT-4 acupoint suggestions overlapped the acupoints selected by the doctors. In half of the cases, significant correlations were observed in the acupoint selection probabilities, implying that GPT-4 acupoint suggestions are similar to those of doctors. Conclusions: GPT-4 made reasonable acupoint suggestions, with notable overlap observed with the prescriptions of doctors. This shows its promise for supporting medical decisions, education, and personalized medicine for patients undergoing acupuncture treatment. Future studies and validation are necessary to ensure the reliability and efficacy of applying GPT-4 in real-world settings.
Keywords: Large Language Model, artificial intelligence, GPT-4, medical decision-making, Acupoint selection
Received: 22 May 2025; Accepted: 01 Sep 2025.
Copyright: © 2025 Chae, Yoon, Kim, Ryu and Lee. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Younbyoung Chae, Kyung Hee University, Seoul, Republic of Korea
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.