Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Oncol.

Sec. Head and Neck Cancer

This article is part of the Research TopicBased Models and Machine Learning on CT, MRI and PET-CT in Head and Neck Cancer Diagnosis, Staging and Outcome PredictionView all 6 articles

Assessing the Robustness and Clinical Evaluation of a Deep‑Learning Segmentation Model for Head and Neck Cancer

Provisionally accepted
Daniel  SchanneDaniel Schanne1Léandre  CuenotLéandre Cuenot2Sarah  BrüningkSarah Brüningk1Mauricio  ReyesMauricio Reyes1,2Olgun  ElicinOlgun Elicin1*
  • 1Department of Radiation Oncology, Inselspital, Bern University Hospital and University of Bern,, Department of Radiation Oncology, University Hospital Bern, Bern, Switzerland
  • 2Universitat Bern ARTORG Center for Biomedical Engineering Research, Bern, Switzerland

The final, formatted version of the article will be published soon.

Background and purpose: Deep learning (DL)-based autosegmentation has improved delineation of organs at risk in radiotherapy for head and neck cancer (HNC). However, automated segmentation of gross tumor volumes (GTVp, GTVn) remains challenging, and robustness under real-world imaging conditions is insufficiently characterized. This study evaluates the robustness and clinical usability of a DL-based PET/CT segmentation model for HNC under clinically relevant perturbations. Materials and methods: A 3D Dynamic U-Net was trained on the public HECKTOR 2022 dataset (474 training, 50 test cases). Synthetic perturbations noise, blur, ghosting, bias-field, spike noise, and motion) were applied to PET and CT images at varying severity levels, generating 36 variants per patient. Segmentation quality was measured using Dice score, Hausdorff Distance, and accuracy. Clinical usability was assessed for 50 baseline and 18 perturbed cases by two clinicians using a five-point Likert scale. Radiomic features were correlated with robustness metrics. Results: Baseline Dice scores were 0.766 (GTVp) and 0.698 (GTVn). Performance dropped significantly under spike noise and bias-field artifacts, especially for GTVn. Clinical usability remained high for GTVp (77.8%) but declined to 27.9% for GTVn under severe perturbations. Lesion volume and surface complexity positively correlated with robustness degradation, while high PET contrast offered protective effects against certain perturbations. Conclusion: DL-based PET/CT segmentation models for HNC show strong baseline performance and robustness for primary tumors. However, nodal tumor segmentation remains vulnerable to specific image artifacts. Enhancing robustness through targeted data augmentation and validation under variable conditions is essential for clinical integration.

Keywords: Autosegmentation, deep learning, head and neck cancer, PET/CT, robustness

Received: 23 Oct 2025; Accepted: 28 Jan 2026.

Copyright: © 2026 Schanne, Cuenot, Brüningk, Reyes and Elicin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Olgun Elicin

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.