IAP-TransUNet: Integration of Attention Mechanism and Pyramid Pooling for Medical Image Segmentation

Shi, Yuxuan; Li, Fang; Zhao, Shuting; Yu, Hongmeng; Chen, Xinrong; Liu, Quan

doi:10.3389/fnbot.2025.1706626

ORIGINAL RESEARCH article

Front. Neurorobot.

IAP-TransUNet: Integration of Attention Mechanism and Pyramid Pooling for Medical Image Segmentation

Provisionally accepted

Yuxuan Shi¹ Fang Li

Fang Li²

¹Fudan University Eye Ear Nose and Throat Hospital Department of ENT and Otorhinolaryngology, Shanghai, China
²School of Computer Science and Technology, Shanghai University of Electric Power, Shanghai, China
³Fudan University, Shanghai, China
⁴College of Biomedical Engineering, Fudan University, Shanghai, China

The final, formatted version of the article will be published soon.

The combination of CNN and Transformer has attracted much attention for medical image segmentation due to its superior performance at present. However, the segmentation performance is affected by limitations such as the local receptive field and static weights of CNN convolution operations, as well as insufficient information exchange between Transformer local regions. To address these issues, an integrated attention mechanism and pyramid pooling network is proposed in this paper. Firstly, an efficient channel attention mechanism is embedded into CNN to extract more comprehensive image features. Then, CBAM_ASPP module is introduced into the bottleneck layer to obtain multi-scale context information. Finally, in order to address the limitations of traditional convolution, depthwise separable convolution is used to achieve a lightweight network. The experiments based on the Synapse multi organ segmentation dataset and ACDC dataset showed that the proposed IAP-TransUNet achieved Dice similarity coefficients of 78.85% and 90.46%, respectively. Compared with the state-of-the-art method, for the Synapse multi organ segmentation dataset, the Hausdorff distance was reduced by 2.92%. For the ACDC dataset, the segmentation accuracy of the left ventricle, myocardium, and right ventricle was improved by 0.14%, 1.89%, and 0.23%, respectively. The experimental results demonstrate that the proposed network has improved the effectiveness and shows strong performance on both CT and MRI data, which suggests its potential for generalization across different medical imaging modalities.

Keywords: transformer, attention mechanism, pyramid pooling, Medical image segmentation, Lightweight Network

Received: 16 Sep 2025; Accepted: 31 Oct 2025.

Copyright: © 2025 Shi, Li, Zhao, Yu, Chen and Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Xinrong Chen, chenxinrong@fudan.edu.cn
Quan Liu, liuqent@163.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.