Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Oncol.

Sec. Cancer Imaging and Image-directed Interventions

This article is part of the Research TopicAdvancing Gastrointestinal Disease Diagnosis with Interpretable AI and Edge Computing for Enhanced Patient CareView all 8 articles

Investigating Object Detection Errors in Endoscopic Imaging of Esophageal SCC and Dysplasia Through Precision-Recall Analysis

Provisionally accepted
  • 1Ditmanson Medical Foundation Chia-Yi Christian Hospital, Chiayi, Taiwan
  • 2Changhua Christian Hospital, Changhua City, Taiwan
  • 3National Chung Cheng University, Minxiong, Taiwan
  • 4Universitas Islam Indonesia, Yogyakarta, Indonesia
  • 5Hualien Tzu Chi Hospital Buddhist Tzu Chi Medical Foundation, Hualien, Taiwan

The final, formatted version of the article will be published soon.

Esophageal squamous cell carcinoma (ESCC) is difficult to detect early on white-light endoscopy (WLI) because lesions are subtle and artifacts (such as glare, bubbles, text, tools) mimic pathology. This study benchmarked five object detectors including two You Only Look Once models (YOLOv5, YOLOv), Faster Region-based Convolutional Neural Networks (Faster R-CNN), Single Shot MultiBox Detector (SSD) and Real-time Detection Transformer (RT-DETR) on WLI dataset using harmonized training (from scratch, 150 epochs, identical hyperparameters) and two label configurations: a 4-label as major categories (SCC, Dysplasia, Bleeding, Inflammation) and an 11-label artifact. Evaluation used macro precision/recall/F1 at IoU 0.50 on a fixed 310-image test set. Incorporating artifact classes improved overall macro metrics, with YOLOv5/YOLOv8 providing the strongest performance in the 11-label scenarios, however, class-wise findings revealed persistent recall limitations for early disease. In the 11-label analysis, Dysplasia detection remained low (YOLOv5: 88/201, 43.8%; YOLOv8: 82/201, 40.8%), and SCC was only moderate (YOLOv5: 25/44, 56.8%; YOLOv8: 24/44, 54.5%). Confusion analyses showed that errors were dominated by non-detections ("background") rather than missclassification with benign or artifact labels, while approximately one in five lesion predictions was a spurious unmatched false positive, implicating both sensitivity and specificity constraints. These results indicate that labeling artifacts reduces non-lesion confusion but does not, by itself, recover subtle early lesions. Limitations include single-center, WLI-only data and training from scratch, future work should prioritize endoscopy-specific pretraining, explicit artifact suppression or joint segmentation, and external validation.

Keywords: esophageal cancer, hyperspectral imaging, esophageal squamous cell carcinoma, Band selection, machine learning, deep learning

Received: 01 Oct 2025; Accepted: 12 Nov 2025.

Copyright: © 2025 Chang, Lee, Mukundan, Karmakar, Bauravindah, Chen, Huang and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Chien-Wei Huang, forevershiningfy@yahoo.com.tw
Hsiang-Chen Wang, hcwang@ccu.edu.tw

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.