Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Oncol.

Sec. Gastrointestinal Cancers: Gastric and Esophageal Cancers

A Multi-Task and Explainable Swin Transformer Framework for Cross-Scale Computational Pathology in Gastrointestinal Cancer

Provisionally accepted
Qing-Chun  FengQing-Chun FengTing  YangTing YangHai-Long  GuoHai-Long GuoXiao-Yun  WangXiao-Yun Wang*
  • Jiading District Central Hospital Affiliated Shanghai University of Medicine & Health Sciences, Shanghai, China

The final, formatted version of the article will be published soon.

Background: Accurate identification and segmentation of tumor and microenvironment features in gastrointestinal cancer (GC) pathology images are crucial for diagnosis, yet challenging for traditional methods. This study aims to develop and validate a deep learning (ML) framework integrating multi-task learning and interpretability mechanisms for cross-scale automatic classification and segmentation of tumor and microenvironmental structures in gastrointestinal cancer histopathological patches, and to evaluate its robustness, output consistency, and decision transparency under controlled benchmark settings. Methods: We constructed a multi-task learning (MTL) model integrating Swin Transformer, DeepLabV3+, and R2U-Net for joint classification and segmentation. The model was trained and validated on approximately 99,000 H&E-stained images from the GasHisSDB and GCHTID datasets. Preprocessing included color normalization and quality control. Performance was evaluated via five-fold cross-validation. Explainability was assessed using Grad-CAM, Score-CAM, and Layer-wise Relevance Propagation (LRP), with validation from pathology experts. Results: The multi-task model achieved a classification F1-score of 0.938 ± 0.007 and a segmentation Dice coefficient of 0.839 ± 0.009 on the test set. Compared with ResNet-50, Swin-T obtained higher classification performance, with improved F1-score (0.945 vs. 0.917) and AUC (0.965 vs. 0.907). For small-volume tissues, the LYM Dice reached 0.781. In cross-domain transfer from GCHTID to GasHisSDB, the model achieved an F1-score of 0.902. Under staining perturbations, the Dice decreased by only 2.4%, and Grad-CAM correlation reached r = 0.86. The expert-model agreement rate (EMAR) was 0.864, with a Cohen’s κ of 0.79. Conclusion: The proposed cross-scale multi-task Transformer framework achieves high-precision recognition and multi-component segmentation of gastrointestinal cancer histopathological images, demonstrating stability and interpretability across scale variations, cross-dataset evaluations, and staining perturbation tests. Overall, this study emphasizes the establishment and validation of a methodological paradigm integrating multi-task joint learning, cross-scale generalization assessment, and interpretable evidence review. As the current validation was conducted under controlled benchmark conditions using strongly annotated patch-level data, the framework should be regarded as a clinically relevant, preclinical validation system. Its feasibility for routine clinical implementation requires further verification through large-scale whole-slide image (WSI) cohorts, prospective multicenter studies, and workflow integration assessments.

Keywords: Cross-Scale Recognition, deep learning, digital pathology, Explainable artificial intelligence, gastrointestinal cancer, Model Generalization, Multi-task learning, transformer network

Received: 19 Nov 2025; Accepted: 09 Feb 2026.

Copyright: © 2026 Feng, Yang, Guo and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Xiao-Yun Wang

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.