AUTHOR=Huang Lianchao , Peng Feng , Huang Binghao , Cao Yinghong 

TITLE=HiImp-SMI: an implicit transformer framework with high-frequency adapter for medical image segmentation

JOURNAL=Frontiers in Physics

VOLUME=Volume 13 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/physics/articles/10.3389/fphy.2025.1614983

DOI=10.3389/fphy.2025.1614983

ISSN=2296-424X

ABSTRACT=Accurate and generalizable segmentation of medical images remains a challenging task due to boundary ambiguity and variations across domains. In this paper, an implicit transformer framework with a high-frequency adapter for medical image segmentation (HiImp-SMI) is proposed. A new dual-branch architecture is designed to simultaneously process spatial and frequency information, enhancing both boundary refinement and domain adaptability. Specifically, a Channel Attention Block selectively amplifies high-frequency boundary cues, improving contour delineation. A Multi-Branch Cross-Attention Block facilitates efficient hierarchical feature fusion, addressing challenges in multi-scale representation.Additionally, a ViT-Conv Fusion Block adaptively integrates global contextual awareness from Transformer features with local structural details, thereby significantly boosting cross-domain generalization. The entire network is trained in a supervised end-to-end manner, with frequency-adaptive modules integrated into the encoding stages of the Transformer backbone. Experimental evaluations show that HiImp-SMI consistently outperforms mainstream models on the Kvasir-Sessile and BCV datasets, including state-of-the-art implicit methods. For example, on the Kvasir-Sessile dataset, HiImp-SMI achieves a Dice score of 92.39%, outperforming I-MedSAM by 1%. On BCV, it demonstrates robust multi-class segmentation with consistent superiority across organs. These quantitative results demonstrate the framework’s effectiveness in refining boundary precision, optimizing multi-scale feature representation, and improving cross-dataset generalization. This improvement is largely attributed to the dual-branch design and the integration of frequency-aware attention mechanisms, which enable the model to capture both anatomical details and domain-robust features. The proposed framework may serve as a flexible baseline for future work involving implicit modeling and multi-modal representation learning in medical image analysis.