Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Pharmacol.

Sec. Ethnopharmacology

Volume 16 - 2025 | doi: 10.3389/fphar.2025.1649041

Assessing the Adherence of Large Language Models to Clinical Practice Guidelines in Chinese Medicine: A Content Analysis

Provisionally accepted
  • 1Lanzhou University School of Public Health, Lanzhou, China
  • 2Lanzhou University School of Basic Medical Sciences, Lanzhou, China
  • 3Gansu University of Chinese Medicine, Lanzhou, China
  • 4Southern Medical University, Guangzhou, China
  • 5Universite de Geneve, Geneva, Switzerland
  • 6Zhongnan Hospital of Wuhan University Center for Evidence-Based and Translational Medicine, Wuhan, China
  • 7Beijing University of Chinese Medicine Dongfang Hospital, Beijing, China
  • 8Beijing University of Chinese Medicine, Beijing, China
  • 9China Academy of Chinese Medical Sciences, Beijing, China
  • 10China Academy of Chinese Medical Sciences Guang'anmen Hospital, Beijing, China
  • 11Lanzhou University Institute of Health Data Science, Lanzhou, China
  • 12University of Madras School of Basic Medical Sciences, Chennai, India

The final, formatted version of the article will be published soon.

ABSTRACT Objective: Whether large language models (LLMs) can effectively facilitate CM knowledge acquisition remains uncertain. This study aims to assess the adherence of LLMs to Clinical Practice Guidelines (CPGs) in CM. Methods:This cross-sectional study randomly selected ten CPGs in CM and constructed 150 questions across three categories: medication based on differential diagnosis (MDD), specific prescription consultation (SPC), and CM theory analysis (CTA). Eight LLMs (GPT-4o, Claude-3.5 Sonnet, Moonshot-v1, ChatGLM-4, DeepSeek-v3, DeepSeek-r1, Claude-4 sonnet, and Claude-4 sonnet thinking) were evaluated using both English and Chinese queries. The main evaluation metrics included accuracy, readability, and use of safety disclaimers. Results: Overall, DeepSeek-v3 and DeepSeek-r1 demonstrated superior performance in both English (median 5.00, interquartile range (IQR) 4.00-5.00 vs. median 5.00, IQR 3.70-5.00) and Chinese (both median 5.00, IQR 4.30-5.00), significantly outperforming all other models. All models achieved significantly higher accuracy in Chinese versus English responses (all p < 0.05). Significant variations in accuracy were observed across the categories of questions, with MDD and SPC questions presenting more challenges than CTA questions. English responses had lower readability (mean flesch reading ease score 32.7) compared to Chinese responses. Moonshot-v1 provided the highest rate of safety disclaimers (98.7% English, 100% Chinese). Conclusion: LLMs showed varying degrees of potential for acquiring CM knowledge. The performance of DeepSeek-v3 and DeepSeek-r1 is satisfactory. Optimizing LLMs to become effective tools for disseminating CM information is an important direction for future development.

Keywords: Chinese medicine, Large Language Model, comparison, Clinical practice guideline, Knowledge acquisition

Received: 18 Jun 2025; Accepted: 17 Jul 2025.

Copyright: © 2025 Zhao, Lai, Pan, Huang, Xia, Bai, Liu, Liu, Jin, Shang, Liu, Shi, Liu, Chen, Estill and Ge. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Long Ge, Lanzhou University School of Public Health, Lanzhou, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.