Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Oncol.

Sec. Cancer Imaging and Image-directed Interventions

Volume 15 - 2025 | doi: 10.3389/fonc.2025.1622426

This article is part of the Research TopicRadiomics and AI-Driven Deep Learning for Cancer Diagnosis and TreatmentView all 3 articles

SMF-Net: Semantic-Guided Multimodal Fusion Network for Precise Pancreatic Tumor Segmentation in Medical CT Image

Provisionally accepted
Wenyi  ZhouWenyi Zhou1Ziyang  ShiZiyang Shi1Bin  XieBin Xie1Fang  LiFang Li2Jiehao  YinJiehao Yin1Yongzhong  ZhangYongzhong Zhang1*Linan  HuLinan Hu2*Lin  LiLin Li1*Yongming  YanYongming Yan1Xiajun  WeiXiajun Wei2Zhen  HuZhen Hu2Zhengmao  LuoZhengmao Luo2Wanxiang  PengWanxiang Peng2Xiaochun  XieXiaochun Xie2Xiaoli  LongXiaoli Long2
  • 1Central South University Forestry and Technology, Changsha, China
  • 2Zhuzhou Hospital Affiliated to Central South University Xiangya's School of Medicine, Zhuzhou, China

The final, formatted version of the article will be published soon.

Background: Accurate and automated segmentation of pancreatic tumors from CT images via deep learning is essential for the clinical diagnosis of pancreatic cancer. However, two key challenges persist: (a) complex phenotypic variations in pancreatic morphology cause segmentation models to focus predominantly on healthy tissue over tumors, compromising tumor feature extraction and segmentation accuracy; (b) existing methods often struggle to retain fine-grained local features, leading to performance degradation in pancreas-tumor segmentation. Methods: To overcome these limitations, we propose SMF-Net (Semantic-Guided Multimodal Fusion Network), a novel multimodal medical image segmentation framework integrating a CNN-Transformer hybrid encoder. The framework incorporates AMBERT, a progressive feature extraction module, and the Multimodal Token Transformer (MTT) to fuse visual and semantic features for enhanced tumor localization. Additionally, The Multimodal Enhanced Attention Module (MEAM) further improves the retention of local discriminative features. To address multimodal data scarcity, we adopt a semi-supervised learning paradigm based on a Dual-Adversarial-Student Network (DAS-Net). Furthermore, in collaboration with Zhuzhou Central Hospital, we constructed the Multimodal Pancreatic Tumor Dataset (MPTD). Results: The experimental results on the MPTD indicate that our model achieved Dice scores of 79.25% and 64.21% for pancreas and tumor segmentation, respectively, showing improvements of 2.24% and 4.18% over the original model. Furthermore, the model outperformed existing state-of-the-art methods on the QaTa-COVID-19 and MosMedData lung infection segmentation datasets in terms of average Dice scores, demonstrating its strong generalization ability.Conclusion: The experimental results demonstrate that SMF-Net delivers accurate segmentation of both pancreatic, tumor and pulmonary regions, highlighting its strong potential for real-world clinical applications.

Keywords: Medical image segmentation, multimodal feature fusion, Semi-Supervised Learning, Convolution transformer-based network, Pancreatic tumor detection

Received: 03 May 2025; Accepted: 30 Jun 2025.

Copyright: © 2025 Zhou, Shi, Xie, Li, Yin, Zhang, Hu, Li, Yan, Wei, Hu, Luo, Peng, Xie and Long. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Yongzhong Zhang, Central South University Forestry and Technology, Changsha, China
Linan Hu, Zhuzhou Hospital Affiliated to Central South University Xiangya's School of Medicine, Zhuzhou, China
Lin Li, Central South University Forestry and Technology, Changsha, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.