ORIGINAL RESEARCH article
Front. Oncol.
Sec. Surgical Oncology
This article is part of the Research TopicArtificial Intelligence in Clinical Oncology: Enhancements in Tumor ManagementView all 11 articles
A Comparative Study on DeepSeek and ChatGPT for Bone and Soft Tissue Tumor Clinical Practice
Provisionally accepted- 1Cancer Hospital Chinese Academy of Medical Sciences, Beijing, China
- 2The First Affiliated Hospital of Hebei North University, Zhangjiakou, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Abstract Background Artificial intelligence (AI) models are increasingly applied in clinical oncology, yet their comparative utility in specialized domains like bone and soft tissue tumors remains understudied. This study evaluates the diagnostic accuracy and clinical reasoning capabilities of DeepSeek and ChatGPT. Methods A two-phase evaluation framework was implemented. First, 249 validated clinical questions (191 single-choice, 58 multiple-choice) spanning five domains (diagnosis, imaging, pathology, staging, treatment) were administered, with expert-derived answers serving as ground truth. Second, nine blinded clinicians scored model-generated analyses of a complex sarcoma case across seven clinical dimensions. Statistical analysis employed chi-square tests for accuracy comparisons, Cohen's kappa for inter-rater reliability, and independent t-tests for expert ratings (α = 0.05). Results DeepSeek outperformed ChatGPT in overall accuracy (74.7% vs 55.4%, p < 0.001), excelling in single-choice questions (86.9% vs 64.9%, p < 0.001) and two key domains: Pathology & Genetics (72.5% vs 40.0%, p = 0.006) and Treatment (71.3% vs 51.2%, p = 0.015). Experts rated DeepSeek higher in imaging interpretation (7.11 vs. 6.00, p = 0.002) and overall case analysis (54.11 vs. 51.56, p = 0.022). Cross-model analysis revealed DeepSeek uniquely answered 60 questions correctly where ChatGPT erred, while both models shared 51 errors. Conclusions DeepSeek outperforms ChatGPT in diagnostic accuracy and specialized clinical reasoning for bone/soft tissue tumors, particularly in pathology and treatment domains. The significant performance gap (p < 0.001) and 24.1% unique correct responses position DeepSeek as a more reliable diagnostic aid, though shared errors (51 questions) necessitate hybrid AI-clinician workflows.
Keywords: Bone and soft tissue tumors, clinical practice, artificial intelligence, Diagnosticaccuracy, Large language models
Received: 07 Jun 2025; Accepted: 16 Dec 2025.
Copyright: © 2025 Yang, Li, Zhang, Li, Zhao, Jia and Yu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Shengji Yu
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
