AUTHOR=Chen Rong , Zhang Siyun , Zheng Yiyi , Yu Qiuhua , Wang Chuhuai TITLE=Enhancing treatment decision-making for low back pain: a novel framework integrating large language models with retrieval-augmented generation technology JOURNAL=Frontiers in Medicine VOLUME=Volume 12 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2025.1599241 DOI=10.3389/fmed.2025.1599241 ISSN=2296-858X ABSTRACT=IntroductionChronic low back pain (CLBP) is a global health problem that seriously affects the quality of life among patients. The etiology of CLBP is complex, with non-specific symptoms and considerable heterogeneity, which poses a great challenge for diagnosis. In addition, the uncertain treatment responses as well as the potential influence of psychological and social factors further increase the difficulty of personalized decision-making in clinical practice.MethodsThis study proposed an innovative support framework on clinical decision, which combined large language models (LLMs) with retrieval-augmented generation (RAG) technology. Moreover, the least-to-most (LtM) prompting technology was introduced, aiming to simulate the decision-making process of senior experts thereby improving personalized treatment for CLBP. Additionally, a special CLBP-related dataset was generated to verify effectiveness of the framework, which compared the proposed model CLBP-GPT with GPT-4.0, ERNIE Bot, and DeepSeek in terms of five key indicators: accuracy, relevance, clarity, benefit, and completeness.ResultsThe results showed that the CLBP-GPT model proposed in this study scored significantly better than other comparison models in all five evaluation dimensions. Specifically, the total score of CLBP-GPT was 4.40 (SD = 0.20), substantially higher than GPT-4.0 (4.03, SD = 0.48), ERNIE Bot (3.54, SD = 0.53), and DeepSeek (3.81, SD = 0.47). In terms of accuracy, the average score of CLBP-GPT was 4.38 (SD = 0.19), while the scores of other models were all below 4, indicating that CLBP-GPT could provide more accurate clinical decision-making recommendations. In addition, CLBP-GPT scored as high as 4.42 (SD = 0.19) in the completeness dimension, further demonstrating that the decision content output by the model was more comprehensive and covered more key information related to CLBP.DiscussionThis study not only provides new technical support for clinical decision-making in CLBP, but also introduces a powerful tool for doctors to formulate personalized and efficient treatment strategies. It is expected to improve the diagnosis and treatment of CLBP in the future.