ORIGINAL RESEARCH article

Front. Med.

Sec. Precision Medicine

Improving TCM Question Answering through Tree-Organized Self-Reflective Retrieval with LLMs

  • 1. Breast Disease Specialist Hospital of Guangdong Provincial Hospital of Chinese Medicine, Guangdong Provincial Hospital of Chinese Medicine, Hangzhou, China

  • 2. Chinese Medicine Artificial Intelligence Joint Engineering Center, Zhejiang Chinese Medical University, Hangzhou, China

  • 3. Zhejiang Chinese Medical University School of Basic Medical Sciences, Hangzhou, China

  • 4. Faculty of Chinese Medicine, Macau University of Science and Technology, macao, China

The final, formatted version of the article will be published soon.

Abstract

Background: Large language models (LLMs) offer significant potential for intelligent question answering (Q&A) in healthcare, yet traditional knowledge representation methods fail to capture the complex, hierarchical nature of Traditional Chinese Medicine (TCM) knowledge systems. The lack of effective retrieval-augmented generation (RAG) frameworks specifically tailored for TCM's unique epistemology limits applications. Objectives: This study aims to evaluate the effectiveness of a novel Tree-Organized Self-Reflective Retrieval (TOSRR) framework in enhancing LLM performance on TCM Q&A tasks through innovative knowledge organization and dynamic self-correction mechanisms. Methods: We developed a hierarchical knowledge representation system that structures TCM knowledge as subject-predicate-object-text (SPO-T) units within a tree-like architecture, enabling multi-dimensional relationships while preserving semantic context. Our iterative self-reflection mechanism implements dynamic knowledge retrieval and validation across textbook chapters and disciplines. Performance was evaluated using randomly selected questions from the TCM Medical Licensing Examination (MLE) and college Classics Course Exam (CCE), representing both standardized clinical knowledge and classical theory assessment. Results: When integrated with GPT-4, the TOSRR framework demonstrated a 19.85% improvement in absolute accuracy on the TCM MLE benchmark and increased recall accuracy from 27% to 38% on CCE datasets. Expert manual evaluation revealed substantial enhancements across critical dimensions: safety, consistency, explainability, compliance, and coherence, with a comprehensive improvement of 18.64 points. Retrieval-Augmented Generation Assessment (RAGAs) metrics confirmed the framework's superior knowledge utilization, retrieval precision, and resistance to information noise compared to standard RAG approaches. Conclusion : The TOSRR framework enhances LLM performance in TCM knowledge tasks through its hierarchical knowledge representation and self-reflective retrieval approach. And the framework has potential for application in teaching.

Summary

Keywords

artificial intelligence, knowledge graph, Large Language Model, medical dialogue system, Traditional Chinese Medicine

Received

24 November 2025

Accepted

18 February 2026

Copyright

© 2026 Liu, Chang, Li, Qu, Li, Cao and Lin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Lingyong Cao; Shuyuan Lin

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Share article

Article metrics