Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Med.

Sec. Precision Medicine

Volume 12 - 2025 | doi: 10.3389/fmed.2025.1618858

This article is part of the Research TopicBridging Surgical Oncology and Personalized Medicine: The Role of Artificial Intelligence and Machine Learning in Thoracic SurgeryView all 3 articles

Compar ative Analysis of Accur acy and Completeness in Standar dized Database Gener ation for Complex Multilingual Lung Cancer Pathological Repor ts: Lar ge Language Model-Based Assisted Diagnosis System vs. DeepSeek, GPT-3.5, and Healthcar e Pr ofessionals with Var ied Pr ofessional Titles, with Task Load Var iation Assessment Among Medical Staff

Provisionally accepted
Xinghua  ChengXinghua Cheng1,2*Hao  HangHao Hang1,3Liankai  YangLiankai Yang4Zhongjie  WangZhongjie Wang1,2Zhebing  LinZhebing Lin1,2Pengchong  LiPengchong Li1,2Jiayue  ZhuJiayue Zhu5Shuai  PuShuai Pu6Rang  LiuRang Liu7
  • 1Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, Beijing Municipality, China
  • 2Shanghai Jiao Tong University, Shanghai, Shanghai Municipality, China
  • 3Department of Clinical Medicine, Bengbu Medical College, Bengbu, Anhui Province, China
  • 4Cangzhou Central Hospital, Cangzhou, Hebei, China
  • 5Southeast University, Nanjing, Jiangsu Province, China
  • 6Liupanshui City People's Hospital, Guizhou, China
  • 7The Second People's Hospital of Hefei, Hefei, Anhui Province, China

The final, formatted version of the article will be published soon.

Background: This study evaluates how AI enhances EHR efficiency by comparing a lung cancer-specific LLM with general-purpose models (DeepSeek, GPT-3.5) and clinicians across expertise levels, assessing accuracy and completeness in complex lung cancer pathology documentation and task load changes pre-/post-AI implementation.Methods:This study analyzed 300 lung cancer cases (Shanghai Chest Hospital) and 60 TCGA cases, split into training/validation/test sets. Ten clinicians (varying expertise) and three AI models (GPT-3.5, DeepSeek, lung cancer-specific LLM) generated pathology reports. Accuracy/completeness were evaluated against LeapFrog/Joint Commission/ACS standards (non-parametric tests); task load changes pre/post-AI implementation were assessed via NASA-TLX (paired t-tests, p<0.05). Results:This study analyzed 1,390 structured pathology databases: 1,300 from 100 Chinese cases (generated by 10 clinicians and three LLMs) and 90 from 30 TCGA English reports. The lung cancer-specific LLM outperformed nurses, residents, interns, and general AI models (DeepSeek, GPT-3.5) in lesion/lymph node analysis and pathology extraction for Chinese records (P<0.05), with total scores slightly below chief physicians. In English reports, it matched mainstream AI in lesion analysis (P>0.05) but excelled in lymph node/pathology metrics (P<0.05). Task load scores decreased by 38.3% post-implementation (413.90±78.09 vs. 255.30±65.50, t=26.481, P<0.001). Conclusions:The fine-tuned lung cancer LLM outperformed non-chief physicians and general LLMs in accuracy/completeness, significantly reduced medical staff workload (P<0.001), with future optimization potential despite current limitations. 

Keywords: Electr onic Health Recor ds (EHRs), clinician bur nout, lar ge language models (LLMs), lung cancer, deepseek, GPT-3.5

Received: 27 Apr 2025; Accepted: 31 Jul 2025.

Copyright: © 2025 Cheng, Hang, Yang, Wang, Lin, Li, Zhu, Pu and Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Xinghua Cheng, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, 200030, Beijing Municipality, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.