ORIGINAL RESEARCH article
Front. Pharmacol.
Sec. Ethnopharmacology
Volume 16 - 2025 | doi: 10.3389/fphar.2025.1649041
Assessing the Adherence of Large Language Models to Clinical Practice Guidelines in Chinese Medicine: A Content Analysis
Provisionally accepted- 1Lanzhou University School of Public Health, Lanzhou, China
- 2Lanzhou University School of Basic Medical Sciences, Lanzhou, China
- 3Gansu University of Chinese Medicine, Lanzhou, China
- 4Southern Medical University, Guangzhou, China
- 5Universite de Geneve, Geneva, Switzerland
- 6Zhongnan Hospital of Wuhan University Center for Evidence-Based and Translational Medicine, Wuhan, China
- 7Beijing University of Chinese Medicine Dongfang Hospital, Beijing, China
- 8Beijing University of Chinese Medicine, Beijing, China
- 9China Academy of Chinese Medical Sciences, Beijing, China
- 10China Academy of Chinese Medical Sciences Guang'anmen Hospital, Beijing, China
- 11Lanzhou University Institute of Health Data Science, Lanzhou, China
- 12University of Madras School of Basic Medical Sciences, Chennai, India
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
ABSTRACT Objective: Whether large language models (LLMs) can effectively facilitate CM knowledge acquisition remains uncertain. This study aims to assess the adherence of LLMs to Clinical Practice Guidelines (CPGs) in CM. Methods:This cross-sectional study randomly selected ten CPGs in CM and constructed 150 questions across three categories: medication based on differential diagnosis (MDD), specific prescription consultation (SPC), and CM theory analysis (CTA). Eight LLMs (GPT-4o, Claude-3.5 Sonnet, Moonshot-v1, ChatGLM-4, DeepSeek-v3, DeepSeek-r1, Claude-4 sonnet, and Claude-4 sonnet thinking) were evaluated using both English and Chinese queries. The main evaluation metrics included accuracy, readability, and use of safety disclaimers. Results: Overall, DeepSeek-v3 and DeepSeek-r1 demonstrated superior performance in both English (median 5.00, interquartile range (IQR) 4.00-5.00 vs. median 5.00, IQR 3.70-5.00) and Chinese (both median 5.00, IQR 4.30-5.00), significantly outperforming all other models. All models achieved significantly higher accuracy in Chinese versus English responses (all p < 0.05). Significant variations in accuracy were observed across the categories of questions, with MDD and SPC questions presenting more challenges than CTA questions. English responses had lower readability (mean flesch reading ease score 32.7) compared to Chinese responses. Moonshot-v1 provided the highest rate of safety disclaimers (98.7% English, 100% Chinese). Conclusion: LLMs showed varying degrees of potential for acquiring CM knowledge. The performance of DeepSeek-v3 and DeepSeek-r1 is satisfactory. Optimizing LLMs to become effective tools for disseminating CM information is an important direction for future development.
Keywords: Chinese medicine, Large Language Model, comparison, Clinical practice guideline, Knowledge acquisition
Received: 18 Jun 2025; Accepted: 17 Jul 2025.
Copyright: © 2025 Zhao, Lai, Pan, Huang, Xia, Bai, Liu, Liu, Jin, Shang, Liu, Shi, Liu, Chen, Estill and Ge. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Long Ge, Lanzhou University School of Public Health, Lanzhou, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.