AUTHOR=Xu Xiaowan , Liu Zhibo , Zhou Shihao , Ji Baoyan , Fan Deyan , Yang Zijuan , Chen Hongli , Yang Xiuli , Guan Mengru TITLE=The clinical application potential assessment of the Deepseek-R1 large language model in lung cancer JOURNAL=Frontiers in Oncology VOLUME=Volume 15 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2025.1601529 DOI=10.3389/fonc.2025.1601529 ISSN=2234-943X ABSTRACT=BackgroundThis study evaluates the clinical potential of the large language model Deepseek-R1 in the diagnosis and treatment of lung cancer, with a specific focus on its ability to assist junior oncologists. The research systematically assesses the model’s performance in terms of diagnostic accuracy, consistency of treatment recommendations, and reliability in clinical decision-making.MethodsA total of 320 patients newly diagnosed with lung cancer were included in this retrospective study. Twenty-six structured clinical questions were designed based on international diagnostic and treatment guidelines. These questions addressed three key domains: basic medical knowledge, complex clinical decision-making, and ethical judgment. All patient data were anonymized before being entered into the Deepseek-R1 model. The model’s responses, along with those generated by five junior oncologists with no more than three years of clinical experience, were independently assessed by senior oncologists with over ten years of experience. A double-blind evaluation protocol was implemented to reduce potential assessment bias. Inter-rater agreement was quantified using Cohen’s Kappa coefficient.ResultsIn the categories of basic knowledge, advanced clinical decisions, and ethical questions, Deepseek-R1 achieved average accuracy rates of 92.3%, 87.5%, and 85.1%, respectively. These rates were significantly higher than those of junior oncologists, whose accuracy rates were 80.4%, 72.8%, and 70.2%, respectively (P < 0.05). In a sample of 256 cases evaluated formally, Deepseek-R1’s overall diagnostic accuracy was 94.6%, compared to 78.9% for junior oncologists (P < 0.05). In a longitudinal assessment of 40 cases with disease progression, the model demonstrated high consistency in updating its recommendations. Logical errors were more frequent among junior oncologists, while ethical risks appeared more commonly in the model-generated responses (44% vs. 21.9%).ConclusionDeepseek-R1 significantly outperformed junior oncologists in terms of diagnostic accuracy and treatment decision-making, particularly in complex and dynamic clinical situations. While limitations remain in its ethical reasoning, the model holds substantial potential for supporting junior physicians, contributing to multidisciplinary discussions, and optimizing treatment pathways.