ORIGINAL RESEARCH article
Front. Comput. Sci.
Sec. Networks and Communications
Volume 7 - 2025 | doi: 10.3389/fcomp.2025.1677905
This article is part of the Research TopicResource Coordination and Joint Optimization in Cloud-Edge-End SystemsView all articles
Enhancing Medical Image Segmentation via Complementary CNN-Transformer Fusion and Boundary Perception
Provisionally accepted- 1Hunan Engineering University, Xiangtan, China
- 2The Third Xiangya Hospital of Central South University, Changsha, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
The emergence of Vision Transformers (ViTs) has demonstrated competitive potential compared to convolutional neural networks (CNNs) in large-scale natural image recognition. However, their direct application to medical image segmentation faces two fundamental limitations: (1) the lack of inductive bias for capturing local anatomical structures, and (2) insufficient adaptability to heterogeneous characteristics of medical imaging modalities including CT, endoscopic, and dermoscopic images. Additionally, the effective integration of multi-scale features from pre-trained CNNs and ViTs remains an understudied challenge in this domain. To address these issues, we propose a Pyramid Feature Fusion Network (PFF-Net) that systematically combines hierarchical representations from pre-trained CNN and Transformer backbones. Our dual-branch architecture synergistically exploits their complementary strengths: the region-aware perception branch establishes global-to-local contextual understanding through pyramid feature fusion, while the boundary-aware perception branch enhances structural precision via a novel hybrid edge detection mechanism. Specifically, the boundary branch employs orthogonally oriented Sobel operators to extract primitive edge cues, which are subsequently refined by integrating low-level fusion features to generate semantically consistent boundaries. These boundary predictions are then iteratively fed back to reinforce feature representations in the region branch, creating a mutually enhancing loop between anatomical region segmentation and structural boundary delineation. Quantitative evaluations demonstrate superior performance across three clinical-critical scenarios: The proposed method achieves 91.87% Dice score on polyp segmentation, surpassing TransUNet by 5.6% (86.96%) with 47.5% reduction in HD95 distance (11.68 vs. 22.25). For spleen CT analysis, it obtains 95.33% Dice, outperforming ESFPNet-S by 4.3% (94.92%) while reducing HD95 by 52.1% (3.35 vs. 6.99). In skin lesion segmentation, our model reaches 90.29% Dice, showing 7.3% improvement over ESFPNet-S (89.64%). Notably, the framework exhibits strong generalization on small-scale datasets despite significant domain gaps between Xiaowei Liu et al. Boundary-Aware Medical Segmentation via CNN-Transformer Fusion medical and natural images. These results validate the pyramid fusion strategy's effectiveness in bridging cross-domain disparities through dual-branch mutual enhancement.
Keywords: Boundary perception, CNNs, Medical image segmentation, transformers, Pretrained Backbone
Received: 01 Aug 2025; Accepted: 25 Aug 2025.
Copyright: © 2025 Liu, Tian, Huang and Shen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Xiaowei Liu, Hunan Engineering University, Xiangtan, China
Wei Shen, The Third Xiangya Hospital of Central South University, Changsha, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.