- 1Department of Interventional Radiology, Suzhou Xiangcheng People’s Hospital, Suzhou, China
- 2Department of Gastroenterology, the Southeast University Affiliated Nantong First People's Hospital, Nantong, China
- 3Department of Gastroenterology, the First People’s Hospital of Nantong, Nantong, China
- 4School of Medicine, Nantong University, Nantong, Jiangsu, China
- 5Affiliated Nantong Hospital 3 of Nantong University, Nantong, China
- 6Department of Radiology, Suzhou Xiangcheng People's Hospital, Suzhou, China
Introduction: Liver cancer is among the deadliest malignancies worldwide, and both its incidence and mortality continue to rise. Precise tumor segmentation often remains difficult due to heterogeneous enhancement patterns, infiltrative margins, and frequently obscured underlying parenchymal disease. While deep learning has advanced the field, existing heavy 3D architectures (e.g., nnU-Net) often require substantial computational resources, which limits their clinical deployment. Standard architectures also still struggle to reconcile fine-grained tissue cues with whole-organ context.
Methods: This study introduces the Liver Cancer Mamba Network (LCMambaNet), an efficient 2D segmentation framework built on selective state-space models. A tailored scan-patch mechanism extracts salient texture- and density-based features, sharpening the discrimination between normal parenchyma and malignant regions. The Liver Cancer Attention Module (LCAM) further decouples the confounding relationships between parenchymal descriptors and tumor characteristics. The selective state-space backbone captures long-range dependencies and continuous feature dynamics. We evaluated the model on both the LITS (CT) and CirrMR160+ (MRI) datasets.
Results: The proposed approach surpasses current state-of-the-art methods, achieving Dice scores of 92.94 ± 3.12% and 92.08 ± 2.85% on the LITS and CirrMR160+ datasets, respectively. Notably, stratified analysis shows superior performance on small lesions (< 2 cm), with statistical significance (p < 0.01) against strong baseline models. Comprehensive ablation studies verify the contribution of each component.
Discussion: The results demonstrate that LCMambaNet offers an efficient, clinically viable solution for 2D liver tumor segmentation. Its design addresses the key limitations of existing models, balancing computational efficiency with high segmentation accuracy. The strong performance on small lesions also highlights its potential to support early diagnosis and precise treatment planning, advancing the clinical utility of AI-based segmentation tools.
1 Introduction
Hepatocellular carcinoma (HCC) is among the most lethal cancers worldwide, with incidence and mortality continuing to rise Jiang et al. (1) Polat et al. (2) Siegel et al. (3). As the fourth leading cause of cancer-related death, it remains difficult to detect and treat early Jesi and Daniel (4). HCC typically arises on a background of chronic liver disease—viral hepatitis, alcoholic liver disease, or non-alcoholic fatty liver disease—with cirrhosis as the strongest risk factor Emam et al. (5) Li et al. (6). Even with modern imaging, early detection is impeded by heterogeneous appearance, infiltrative growth, and the complex milieu of chronically diseased parenchyma Tejaswi and Rachapudi (7).
Current State-of-the-Art (SOTA) in medical segmentation is dominated by 3D volumetric models. The self-configuring nnU-Net? and Transformer-based architectures like UNETR? and Swin-UNETR? have set high benchmarks. However, these 3D models incur high memory costs and latency, posing challenges for real-time clinical workflows Xing et al. (8). Conversely, 2D approaches are efficient but traditionally lack global context. Despite notable gains in liver segmentation Chen et al. (9), liver cancer segmentation remains difficult due to: (1) phase-dependent variability in HCC enhancement Archana and Anand (10), (2) benign lesions that mimic malignancy Wu et al. (11), (3) intratumoral heterogeneity with necrosis Gul et al. (12), and (4) architectural distortion from cirrhosis Zhang et al. (13)Vijayaprabakaran et al. (14).
Transformers have advanced sequence modeling by using self-attention to capture long-range dependencies without recurrence. However, the quadratic scaling of self-attention with sequence length limits their efficiency on long inputs Li et al. (15). To mitigate this, recent work integrates State Space Models (SSMs) Zhou et al. (16) Wang et al. (17) Liu et al. (18) Ma et al. (19) Ruan et al. (20) into Transformer-like designs, yielding Mamba-style architectures that replace self-attention with linear recurrent layers derived from state-space formulations Wang et al. (21) Liao et al. (22) Liu et al. (23).
Furthermore, recent advances have explored incorporating domain-specific constraints and discrete representation learning to improve segmentation robustness. Approaches utilizing anatomical priors Lastname and Others (24) guide the network using shape constraints, while Vector Quantized Variational Autoencoders (VQ-VAE) employing codebook-based learning Lastname and Others (25) have shown promise in modeling discrete feature distributions to handle tissue heterogeneity. While effective, these methods often add complexity to the inference pipeline. In contrast, our approach seeks to balance representation power with inference efficiency.
This work proposes the 2D Liver Cancer Mamba Network (LCMambaNet), a Mamba-based framework tailored for slice-wise liver cancer segmentation, as illustrated in Figure 1. We explicitly adopt a 2D strategy to maximize computational efficiency while leveraging Mamba’s ability to model long-range dependencies across the entire slice plane. The model learns factorized local–global representations, mines correlations between healthy parenchyma and tumor regions, and delivers an effective automated solution.
Figure 1. Overview of the proposed LCMambaNet architecture. The network processes 2D slices to ensure low-latency inference, utilizing a hierarchical Mamba encoder (Left) and the LCAM module (Right) to capture global context usually lost in 2D methods.
In summary, the contributions are:
1. Introduction of LCMambaNet, which combines a tailored scanning strategy, custom kernel operators, and selective state-space blocks to achieve accurate and efficient liver cancer segmentation.
2. Design of a liver-specific feature extractor to harvest critical tissue attributes from texture and density cues.
3. Development of a specialized SSM block that captures long-range dependencies, harmonizes local detail with whole-organ context.
4. Comprehensive experiments demonstrating state-of-the-art performance on public liver cancer datasets, including lesion-size stratification and statistical significance testing.
2 Method
2.1 Overview of the architecture
The proposed LCMamba Net architecture addresses the unique challenges of liver cancer segmentation through a meticulously designed hierarchical framework that leverages selective state space models. Figure 1 illustrates the overall architecture, which comprises three key components:
1. A Mamba-based encoder that extracts multi-scale features.
2. Enhanced State Space Model (SSM) blocks for feature refinement.
3. A specialized Liver Cancer Attention Module (LCAM).
The encoder processes an input liver image slice and extracts hierarchical features . This 2D formulation significantly reduces the parameter count compared to 3D counterparts like V-Net or Swin-UNETR.
To enhance model flexibility while maintaining computational efficiency, two variants are considered:
● LCMamba-T: Utilizes Mamba-Tiny backbone with no additional SSM blocks (N = 0).
● LCMamba-S: Employs Mamba-Small backbone with one additional SSM block (N = 1).
The computational complexity of our model scales linearly with image size, as shown in Equation 1:
where H, W represent image dimensions, C denotes the maximum channel size, and N indicates the number of SSM blocks.
2.2 Encoder
The Mamba-based encoder forms the backbone of our architecture. The encoder processes input through a series of hierarchical stages:
1. Initial Embedding: A stem module transforms the input image into initial feature maps via a convolutional layer, as defined in Equation 2:
2. Multi-Stage Processing: The embedded features progressively pass through four stages, each containing SSM blocks, as described in Equation 3:
where each stage downsamples the spatial resolution while increasing channel dimensions, shown in Equation 4:
2.3 State space model block
The core innovation in our architecture lies in the application of selective state space models for medical image processing. The SSM block implements a 2D-Selective-Scan module (SS2D) that efficiently models bidirectional dependencies across the image Zhu et al. (26). For an input feature map , the SS2D operates along four scanning directions: →, ←, ↓, ↑. For each direction, the selective scan process is formulated as (Equation 5):
where represents the hidden state at position t, xt is the input at position t. The learnable parameters are derived through discretization, as shown in Equation 6:
Here, Δ represents the discretization step size. The parameter A is structured to be selective, as defined in Equation 7:
The outputs from the four scanning paths are combined to form a comprehensive representation, computed via Equation 8:
2.4 Liver cancer attention module
To directly tackle liver cancer delineation, we present LCAM to exploit multi-scale features to sharpen segmentation boundaries. Given an initial segmentation and an encoder feature map , the module performs refinement. LCAM first derives an attention map with . The attention then modulates encoder features via . The overall operation is , combining refined features with the upsampled prior.
2.5 Loss function and optimization
The loss function is carefully designed to address the inherent class imbalance. We employ a weighted combination of Binary Cross-Entropy (BCE) and Dice losses, as formulated in Equation 9:
To specifically enhance performance at tumor boundaries, we introduce a boundary-aware term, shown in Equation 10:
The final loss function is the weighted sum of these components, as given in Equation 11:
3 Experiments and results
3.1 Datasets
1) LiTS Dataset: The Liver Tumor Segmentation (LiTS) dataset comprises 201 abdominal CT scans. To ensure reproducibility and Evaluation Protocol Transparency, we utilized the official training set (130 scans) and performed a Fixed Internal Split (Seed=42): 100 scans for training, 15 for validation, and 15 for testing. We report metrics on this held-out test set.
2) CirrMRI600+ Dataset: This dataset includes 628 high-resolution abdominal MRI volumes from 339 patients. We followed the dataset’s predefined partitioning scheme.
3.2 Implementation and reproducibility
All experiments were implemented in PyTorch 1.13.0 with CUDA 11.7 and cuDNN 8.5 on a single NVIDIA RTX A10 GPU (24GB). To ensure reproducibility, random seeds were fixed to 3407. Encoders were initialized with pre-trained Mamba weights.
During training, images were resized to 256×256. We applied random rotation (± 15°), horizontal flips, and vertical flips. We used the Adam optimizer (initial LR 1 × 10−4) with ReduceLROnPlateau. Batch size was 16.
3.3 Evaluation metrics
We report Dice coefficient (Dice), mean Intersection over Union (mIoU), recall, precision, F2 score, and 95% Hausdorff distance (HD95). Following statistical rigor guidelines, all results are reported as Mean ± Standard Deviation (SD). Furthermore, we report the 95% Confidence Intervals (CI) for the primary Dice metric to quantify estimation uncertainty. To control the family-wise error rate (FWER) during multiple hypothesis testing across different models, we applied the Holm-Bonferroni correction (p < 0.05 considered significant).
3.4 Benchmarking
1) Results on CirrMRI600+ Dataset: Extensive experiments show that LCMamba Net achieves superior performance, as detailed in Table 1. Statistical significance was assessed using a paired Wilcoxon signed rank test. LCMamba-S shows significant improvement over TransResUNet (p < 0.01).
Qualitative analysis through visual comparison, shown in Figure 2, further validates effectiveness on the LiTS dataset. Despite being a 2D method, LCMambaNet approximates the boundary delineation quality of 3D baselines while operating at significantly lower latency.
2) Results on LiTS Dataset: Table 2 presents a comprehensive quantitative analysis on the Fixed Internal Split. Results are reported as Mean ± SD across the test cases. LCMamba-T attained the highest Dice coefficient of 92.94 ± 3.12%.
3.5 Tumor size stratification analysis
To further address the clinical challenge of detecting small lesions (Reviewer #4), we performed a stratified analysis based on tumor diameter: Small (< 2 cm), Medium (2 − 5 cm), and Large (> 5 cm). As shown in Table 3, LCMamba-S demonstrates exceptional robustness in the “Small” category, outperforming the baseline TransResUNet by 2.4% in Dice, validating the effectiveness of the LCAM module in capturing fine-grained details.
3.6 Computational efficiency analysis
We compared the computational efficiency of our proposed LCMamba variants against state-of-the-art methods in Table 4. Benchmarks were conducted on a single NVIDIA RTX A10 GPU (24GB) using FP32 precision with an input resolution of 256 × 256 and a batch size of 1. “Per-Volume” inference time is estimated based on an average volume depth of 150 slices. LCMamba-S achieves a competitive inference speed of 24 ms/slice, significantly faster than Transformer-based counterparts, making it suitable for clinical deployment.
3.7 Ablation study
Table 5 presents a comprehensive ablation study verifying the contribution of individual modules. Figure 3 provides qualitative comparisons for these configurations.
Furthermore, we conducted an ablation study on augmentation strategies (Table 6) and a robustness analysis across different dataset splits (Table 7) to ensure the reliability of our proposed method.
4 Conclusion
This paper introduces LCMamba Net, a novel 2D architectural framework that strategically leverages selective state space models to address the computational bottlenecks of high-resolution medical image segmentation. While 3D State-of-the-Art models like nnU-Net? provide global volumetric context, they inherently demand high computational resources and memory, restricting their deployment in real-time or resource-limited clinical environments. Our approach bridges this critical gap, offering competitive accuracy (Dice 92.94%) comparable to 3D baselines while maintaining the high efficiency characteristic of 2D networks. By incorporating the Mamba backbone, LCMamba Net effectively models long-range dependencies within slices, mitigating the limited receptive field issues typical of standard CNNs.
However, a primary limitation of our current slice-wise approach is the lack of inter-slice consistency, as the model does not explicitly learn the Z-axis spatial continuity found in volumetric data. This may result in minor inconsistencies in boundary predictions across sequential slices. Future work will focus on extending the Mamba block to a pseudo-3D or 2.5D framework to capture inter-slice correlations without incurring the full computational cost of 3D convolutions, further enhancing segmentation robustness for clinical applications.
Data availability statement
The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding authors.
Author contributions
PS: Supervision, Writing – review & editing. JY: Writing – review & editing, Conceptualization, Supervision, Writing – original draft, Project administration, Validation, Methodology. QG: Methodology, Data curation, Formal analysis, Writing – original draft. LZ: Conceptualization, Data curation, Methodology, Writing – original draft, Formal analysis. YS: Data curation, Writing – original draft, Conceptualization, Validation. QW: Visualization, Funding acquisition, Validation, Writing – review & editing. LG: Writing – review & editing, Validation, Supervision, Investigation, Methodology, Conceptualization. JZ: Writing – review & editing, Supervision, Investigation, Methodology, Conceptualization.
Funding
The author(s) declared that financial support was received for this work and/or its publication. Scientific Research Project Funded by Nantong Municipal Health Commission (Grant No. MS2024065). Scientific Research Project Funded by Nantong Municipal Health Commission (Grant No. MS2025050). Nantong Municipal Science and Technology Bureau Social Livelihood Science and Technology Funding Project (Grant No. MSZ21066).
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
1. Jiang H, Qu J, Wang L, Gao P, Zheng B, Zhang H, et al. Hepatobiliary phase manifestations of breast cancer liver metastasis: differentiating molecular types through gd-eob-dtpa-enhanced MRI. BMC Med Imaging. (2025) 25:104. doi: 10.1186/S12880-025-01648-7
2. Polat K, Sahan S, Kodaz H, and Günes S. Breast cancer and liver disorders classification using artificial immune recognition system (AIRS) with performance evaluation by fuzzy resource allocation mechanism. Expert Syst Appl. (2007) 32:172–83. doi: 10.1016/J.ESWA.2005.11.024
3. Siegel RL, Kratzer TB, Giaquinto AN, Sung H, and Jemal A. Cancer statistics 2025. CA: A Cancer J Clin. (2025) 75:10–45. doi: 10.3322/caac.21871
4. Jesi PM and Daniel VAA. Differential CNN and KELM integration for accurate liver cancer detection. Biomed Signal Process Control. (2024) 95:106419. doi: 10.1016/J.BSPC.2024.106419
5. Emam MM, Mostafa RR, and Houssein EH. Computer-aided diagnosis system for predicting liver cancer disease using modified genghis khan shark optimizer algorithm. Expert Syst Appl. (2025) 285:128017. doi: 10.1016/J.ESWA.2025.128017
6. Li Y, Zheng X, Li J, Dai Q, Wang C, and Chen M. LKAN: llm-based knowledge-aware attention network for clinical staging of liver cancer. IEEE J Biomed Health Inf. (2025) 29:3007–20. doi: 10.1109/JBHI.2024.3478809
7. Tejaswi VSD and Rachapudi V. Computer-aided diagnosis of liver cancer with improved segnet and deep stacking ensemble model. Comput Biol Chem. (2024) 113:108243. doi: 10.1016/J.COMPBIOLCHEM.2024.108243
8. Xing Z, Ye T, Yang Y, Liu G, and Zhu L. SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation. Cham: Springer (2024).
9. Bilic P, Christ P, Li HB, Vorontsov E, Ben-Cohen A, Kaissis G, et al. The Liver Tumor Segmentation Benchmark (LiTS). Medical Image Analysis. (2023) 84(000):24. doi: 10.1016/j.media.2022.102680
10. Archana R and Anand L. Residual u-net with self-attention based deep convolutional adaptive capsule network for liver cancer segmentation and classification. Biomed Signal Process Control. (2025) 105:107665. doi: 10.1016/J.BSPC.2025.107665
11. Wu K, Chen X, and Ding M. Deep learning based classification of focal liver lesions with contrast-enhanced ultrasound. Optik - Int J Light Electron Optics. (2014) 125:4057–63. doi: 10.1016/j.ijleo.2014.01.114
12. Gul S, Khan MS, Bibi A, Khandakar A, Ayari MA, and Chowdhury ME. Deep learning techniques for liver and liver tumor segmentation: A review. Comput Biol Med. (2022) 147:105620. doi: 10.1016/j.compbiomed.2022.105620
13. Zhang H, Guo L, Li J, Wang J, Ying S, and Shi J. Multi-view disentanglement-based bidirectional generalized distillation for diagnosis of liver cancers with ultrasound images. Inf Process Manage. (2024) 61(6). doi: 10.1016/j.ipm.2024.103855
14. Vijayaprabakaran K, Ramalingam P, Ramalingam R, Ilavendhan A, and Vedhapriyavadhana R. Cunet-clstm: A novel fusion of cunet and CLSTM for superior liver cancer detection in CT scans. IEEE Access. (2025) 13:66373–92. doi: 10.1109/ACCESS.2025.3559592
15. Li D, Su M, and Liu Y. Mspdd-net: Mamba semantic perception dual decoding network for retinal image vessel segmentation. Comput Biol Med. (2025) 193:110370. doi: 10.1016/J.COMPBIOMED.2025.110370
16. Zhou T, Chai W, Chang D, Chen K, Zhang Z, and Lu H. Mambayolact: you only look at mamba prediction head for head-neck lymph nodes. Artif Intell Rev. (2025) 58:180. doi: 10.1007/S10462-025-11177-Y
17. Wang Z, Li L, Zeng C, Dong S, and Sun J. Slb-mamba: A vision mamba for closed and open-set student learning behavior detection. Appl Soft Comput. (2025) 180:113369. doi: 10.1016/J.ASOC.2025.113369
18. Liu J, Shang Y, Yang M, Shao Z, Ding H, and Liu T. Cfgmamba: Cross frame group mamba for video-based depression recognition. Biomed Signal Process Control. (2025) 110:108113. doi: 10.1016/J.BSPC.2025.108113
19. Xing Z, Ye T, Yang Y, Cai D, Gai B, Wu XJ, et al. SegMamba-V2: Long-Range Sequential Modeling Mamba for General 3-D Medical Image Segmentation. IEEE Trans Med Imaging. (2025) 45(9). doi: 10.1109/TMI.2025.3589797
20. Liang P, Shi L, Pu B, Wu R, Chen J, Zhou L, et al. MambaSAM: A Visual Mamba-Adapted SAM Framework for Medical Image Segmentation. IEEE J Biomed Health Inform. (2025) 29:5824–35. doi: 10.1109/jbhi.2025.3544548
21. Wang Y, Guo T, Yuan W, Shu S, Meng C, and Bai X. Mamba-based deformable medical image registration with an annotated brain MR-CT dataset. Comput Med Imaging Graph. (2025) 123:102566. doi: 10.1016/J.COMPMEDIMAG.2025.102566
22. Liu J, Yang H, Zhou H, Yu L, Liang Y, Yu Y, et al. Swin-UMamba†: Adapting Mamba-Based Vision Foundation Models for Medical Image Segmentation. IEEE Trans Med Imaging. (2024) 44:3898–908. doi: 10.1109/tmi.2024.3508698
23. Liu J, Yang H, Zhou HY, Xi Y, Yu L, Li C, et al. Swin-umamba: Mamba-based unet withimagenet-based pretraining Vol. 15009. . Cham: Springer (2024) p. 615–25. doi: 10.48550/arxiv.2402.03302
24. Zheng Z, Yan H, Setzer FC, Shi KJ, and Li J. Anatomically Constrained Deep Learning for Automating Dental CBCT Segmentation and Lesion Detection. IEEE Trans Med Imaging. (2020) PP(99):1–12. doi: 10.1109/TASE.2020.3025871
25. Jia G, He P, Dai T, Goh D, Wang J, and Sun M. Spatial immune scoring system predicts hepatocellular carcinoma recurrence. Nature. (2025) (Apr.24 TN.8060):640. doi: 10.1038/s41586-025-08668-x
Keywords: 2D efficient networks, attention mechanism, liver cancer segmentation, medical imaging, state space models
Citation: Sun P, Yu J, Gu Q, Zhang L, Sun Y, Wang Q, Gu L and Zhu J (2026) Clinically oriented automatic 2D liver tumor segmentation: LCMambaNet with a state-space model and liver cancer–specific attention. Front. Oncol. 16:1676424. doi: 10.3389/fonc.2026.1676424
Received: 30 July 2025; Accepted: 06 January 2026; Revised: 04 January 2026;
Published: 03 February 2026.
Edited by:
Kang Wang, Shanghai General Hospital, ChinaReviewed by:
Hossam El-Din Moustafa, Mansoura University, EgyptAbu Salam, Universitas Dian Nuswantoro, Indonesia
Copyright © 2026 Sun, Yu, Gu, Zhang, Sun, Wang, Gu and Zhu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Jianchun Zhu, emh1amlhbmNodW54eUAxNjMuY29t; Liugen Gu, Z3VsaXVnZW5Ac2luYS5jb20=
†These authors have contributed equally to this work and share first authorship
Pengcheng Sun1†