Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Oncol.

Sec. Breast Cancer

Ensemble Transformer-Based Multiple Instance Learning for Predicting Neoadjuvant Chemotherapy Response from Breast Cancer Biopsy Whole-Slide Images

Provisionally accepted
Zhenshui  WuZhenshui Wu1Kaining  YeKaining Ye2Jianming  WengJianming Weng2Zhongping  ZhangZhongping Zhang2Liao  XuehongLiao Xuehong3*Kaixin  DuKaixin Du4
  • 1Fujian Medical University Affiliated First Quanzhou Hospital, Quanzhou, China
  • 2Zhangzhou Affiliated Hospital of Fujian Medical University, Zhangzhou, China
  • 3Sapporo Ika Daigaku Igakubu Daigakuin Igaku Kenkyuka, Sapporo, Japan
  • 4Fujian Medical University Xiamen Hong'ai Hospital, Xiamen, China

The final, formatted version of the article will be published soon.

Neoadjuvant chemotherapy (NAC) is a cornerstone of breast cancer management, and accurate prediction of therapeutic efficacy is essential for optimizing treatment strategies and improving patient outcomes. This study proposes an integrated Transformer-based Multiple Instance Learning (MIL) framework that leverages pre-treatment biopsy whole-slide images (WSIs) to predict NAC response. A multi-institutional dataset of 128 patients was collected, comprising 86 cases for training, 42 for internal validation, and 22 microscope images for external validation. The framework integrates ResNet50 feature extraction, a multi-scale attention Transformer encoder, and a two-stage classification strategy to capture both local morphological and global contextual features. Class imbalance was mitigated using SMOTE and ADASYN, while domain adaptation (DANN) and metric learning enhanced cross-modal robustness. The proposed model achieved a WSI-level accuracy of 79.3% in internal validation and demonstrated strong discriminative ability in identifying pathological complete response (pCR, AUC = 0.82) and non-response (AUC = 0.77). External validation using lower-resolution microscope images yielded an AUC of 0.70 for pCR and 0.67 for non-response, outperforming traditional CNN architectures such as GoogleNet, ResNet34, and SqueezeNet. The model's heatmap visualizations revealed well-defined lesion boundaries and interpretable regions of interest, underscoring its clinical transparency. By relying solely on hematoxylin and eosin-stained WSIs, the framework provides a fully automated, interpretable, and resource-efficient approach suitable for real-world deployment. The two-stage classification design offers fine-grained stratification between pCR, partial, and poor responders, which is critical for personalized therapy planning. Future work will focus on expanding multi-center datasets and integrating advanced pathology foundation models to further enhance cross-domain generalization and clinical applicability.

Keywords: breast cancer, Multiple Instance Learning, Neoadjuvant chemotherapy, Pathomics, PCR, transformer

Received: 20 Oct 2025; Accepted: 02 Feb 2026.

Copyright: © 2026 Wu, Ye, Weng, Zhang, Xuehong and Du. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Liao Xuehong

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.