Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Med.

Sec. Healthcare Professions Education

This article is part of the Research TopicArtificial Intelligence for Technology Enhanced LearningView all 10 articles

Supporting Postgraduate Exam Preparation with Large Language Models: Implications for Traditional Chinese Medicine Education

Provisionally accepted
Baifeng  WangBaifeng Wang1Meiwei  ZhangMeiwei Zhang2Zhe  WangZhe Wang3Keyu  YaoKeyu Yao4Meng  HaoMeng Hao4Junhui  WangJunhui Wang5Suyuan  PengSuyuan Peng4*Yan  ZhuYan Zhu4*
  • 1Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
  • 2School of Medical Informatics, Changchun University of Chinese Medicine, Changchun, China
  • 3Institute of Medical Informatics, Statistics, and Epidemiology, Universitat Leipzig, Leipzig, Germany
  • 4Institute of Information on Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing, China
  • 5Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China

The final, formatted version of the article will be published soon.

Introduction: In China, the medical education system features multiple co-existing levels, with higher education often leading to better job prospects. In career advancement—especially for entry into competitive urban hospitals—the postgraduate examination often plays a more decisive role than the licensing examination. The application of Large Language Models (LLMs) in Traditional Chinese Medicine (TCM) has rapidly expanded. TCM theories possess distinct scientific features, requiring LLMs to demonstrate advanced information processing and comprehension abilities in a Chinese context. While LLMs have shown strong performance in many countries' licensing examinations, their performance in selective TCM examinations remains underexplored. This study aimed to evaluate and compare the performance of Ernie Bot, ChatGLM, SparkDesk, and GPT-4 on the 2023 Chinese Postgraduate Examination for TCM (CPE-TCM), and explore their potential in supporting TCM education and academic development. Methods: We assessed the performance of four LLMs using the 2023 CPE-TCM as a test set. Exam scores were calculated to evaluate subject-specific performance. Additionally, responses were qualitatively analyzed based on logical reasoning and the use of internal and external information. Results: Ernie Bot and ChatGLM achieved accuracy rates of 50.30% and 46.67%, respectively, both above the passing score. Statistically significant differences in subject-specific performance were observed, with the highest scores in the medical humanistic spirit module. ChatGLM and GPT-4 provided logical explanations for all responses, while Ernie Bot and SparkDesk showed logical reasoning in 98.2% and 43.6% of responses, respectively. ChatGLM and GPT-4 incorporated internal information in all explanations, whereas SparkDesk rarely did. Over 60% of responses from Ernie Bot, ChatGLM, and GPT-4 included external information, which did not significantly differ between correct and incorrect answers. In SparkDesk, the presence of internal or external information was significantly associated with answer correctness (P < .001). Discussion: Ernie Bot and ChatGLM surpassed the passing threshold for postgraduate selection, reflecting solid TCM expertise. LLMs demonstrated strong capabilities in logical reasoning and integration of background knowledge, highlighting their promising role in enhancing TCM education.

Keywords: Large Language Models (LLMs), Traditional Chinese Medicine, medicaleducation, Ernie Bot, ChatGLM, SparkDesk, GPT-4

Received: 16 Jul 2025; Accepted: 02 Dec 2025.

Copyright: © 2025 Wang, Zhang, Wang, Yao, Hao, Wang, Peng and Zhu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Suyuan Peng
Yan Zhu

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.