Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Oncol., 27 January 2026

Sec. Cancer Imaging and Image-directed Interventions

Volume 15 - 2025 | https://doi.org/10.3389/fonc.2025.1725514

This article is part of the Research TopicNext-Generation Preclinical Imaging and Analytical Technologies for Personalized OncologyView all 3 articles

Wavelet-enhanced boundary adaptation network for liver hemangioma segmentation in non-contrast CT

  • 1School of Economics and Management, Chongqing Jiaotong University, Chongqing, Nan’An, China
  • 2School of Mathematics and Statistics, Chongqing Jiaotong University, Chongqing, Nan’An, China
  • 3Department of PET/MR, Shanghai Universal Medical Imaging Diagnostic Center, Shanghai, Xu’Hui, China
  • 4Radiology Department, Fengdu General Hospital, Chongqing, Feng’Du, China

Liver hemangioma segmentation in non-contrast CT images faces significant challenges due to the absence of contrast-enhanced features. This paper introduces WLAU-Net, a novel architecture integrating three key innovations for contrast agent free segmentation. First, our transfer learning framework pre-trains the encoder on venous phase CT images to capture discriminative tumor features, then transfers and freezes these learned weights when processing non-contrast phase data, effectively preventing domain shift. Second, we implement a wavelet transformation module using sym4 wavelet decomposition to split images into four frequency subbands (LL, LH, HL, HH). By selectively amplifying horizontal (HL) and vertical (LH) edge coefficients during reconstruction, we enhance tumor boundary delineation while preserving anatomical context. Third, a local attention mechanism with Gaussian-based adaptive weighting dynamically prioritizes low-intensity tumor regions over high-intensity areas, sharpening focus on subtle boundaries. Experimental results demonstrate WLAU-Net’s superiority with a 65.37% Dice score and 96.23% ACC, outperforming state-of-the-art methods including CS-UNet (64.50% Dice, 93.85% ACC) and Swin-UNet (62.34% Dice, 91.15% ACC). Ablation studies reveal critical contributions from each component: enabling all modules (transfer learning, Gaussian attention, and wavelet enhancement) achieves optimal performance, while removing the wavelet module reduces Dice by 1.16% (64.21%) and disabling both Gaussian and wavelet modules decreases ACC by 3.0% (93.24%). Compared to contrast-enhanced methods (92.1% ACC), our approach maintains competitive diagnostic accuracy (96.23% ACC) while eliminating allergic risks, offering a clinically viable alternative for contrast agent sensitive patients.

1 Introduction

Liver hemangioma Segmentation in Computed Tomography (CT) plays a crucial role in diagnosis, surgical planning, and treatment monitoring. While contrast-enhanced CT (CECT) provides clear tumor boundaries through the injection of contrast agents, these agents pose significant risks for patients with allergies or renal insufficiency (1). Non-contrast CT, despite its safety, presents substantial segmentation challenges due to low soft-tissue contrast and ambiguous tumor boundaries (2).

In the field of medical image segmentation, the U-Net model, with its unique symmetric encoder-decoder architecture and skip connections to enhance detail preservation (3), has become the preferred choice for many researchers. Building upon this framework, significant advancements have been achieved in various medical tumor segmentation tasks, including brain tumor segmentation in magnetic resonance (MR) imaging, liver hemangioma segmentation using CT scans, and pancreatic tumor segmentation (4, 5).

While convolutional neural networks (CNNs) excel in feature representation, their limited ability to capture tumor edge features prompted the integration of attention mechanisms into U-Net, leading to the development of Attention UNet (6). Despite U-Net’s strong segmentation capabilities, the spatial limitations of convolutional operations restrict their global feature modeling. Recognizing this, researchers have increasingly turned to Transformers, which rely entirely on attention mechanisms and inherently excel at capturing global context (7). However, as Transformers focus more on global context modeling, hybrid approaches combining CNNs with Transformer encoders show greater potential. TransUNet, introduced in 2021, became one of the first models to apply Transformer technology to medical image analysis (8). This method leverages the U-Net encoder’s strength in capturing high-resolution spatial details while harnessing the Transformer’s ability to model global context, inspiring extensive follow-up research (9). Nevertheless, when applying TransUNet to non-contrast images, the lack of contrast enhancement results in indistinct tumor features and blurred edges, leading to discontinuous segmentation. Additionally, global attention mechanisms still struggle to precisely localize tumor regions (10, 11).

To address these challenges, we propose the Wavelet and Local Attention UNet (WLAU-Net), a framework extending TransUNet for segmenting tumors in non-contrast CT images. First, the original non-contrast and venous phase data are input into a Wavelet-based Edge Enhancement Module (WEEM). This module decomposes the image into multi-frequency sub-bands (LL, LH, HL, HH), amplifies high-frequency components (HL and LH) via a frequency band residual amplification strategy, and reconstructs the image through inverse wavelet transform (1214). By evaluating different wavelet functions and amplification coefficients, the optimal configuration is selected to enhance tumor edge features (15). Next, venous phase data are input with frozen CNN encoder parameters, while non-contrast data are used for training. We then introduce a Gaussian-based Position-sensitive Attention (GPSA) module, which dynamically weights CNN features using Gaussian functions (16). The GPSA-generated dynamically weighted feature maps are tokenized into patches and processed by a Transformer encoder. This enables seamless fusion of global self-attention features with high-resolution CNN features via skip connections for precise localization. Finally, the Transformer decoder reformulates pixel-wise segmentation into a mask classification task, treating predicted candidate regions as learnable queries (17). These queries interact with local multi-scale CNN features through a collaborative cross-attention mechanism, progressively refining the final results (overall framework in Figure 1).

Figure 1
Flowchart illustrating a machine learning model for enhanced liver image analysis. It processes venous and non-contrast phase images, sharing weights, then extracting hidden features using a GPSA module and transformer layers. The model includes down-sampling, convolution, and up-sampling, ultimately delivering a segmentation output. Key processes are color-coded, with purple indicating hidden features and various diagrammatic symbols such as arrows for data flow and circles for operations like matrix multiplication.

Figure 1. The overview of WLAU-Net framework.

In summary, the key contributions of this work are:

1. WLAU-Net, a novel hybrid architecture based on the TransUNet framework. Under a supervised cross-domain protocol, the model integrates wavelet transforms and Gaussian-based dynamic weighting to achieve precise segmentation of non-contrast liver hemangiomas in CT.

2. GPSA, a Gaussian-based dynamic weighting module addressing feature discrepancies between non-contrast and venous phase data in transfer learning. It effectively mitigates challenges in focusing on tumor regions in non-contrast images.

3. WEEM, an image reconstruction module enhancing tumor boundary features while preserving anatomical integrity. By decomposing images into frequency sub-bands and amplifying specific components via inverse wavelet transform, it significantly improves tumor edge recognition.

4. The proposed model enables accurate tumor segmentation in CT images without contrast agents, reducing allergy risks associated with their use.

The remainder of this paper is organized as follows: Section 2 describes the dataset, preprocessing methods, and reviews the TransUNet model while detailing the principles of WEEM and GPSA. Section 3 presents experimental setups, evaluates the proposed method using Dice Similarity Coefficient (Dice) and Hausdorff Distance (HD), and analyzes segmentation accuracy across tumors of varying sizes. Section 4 concludes the paper and outlines future research directions.

2 Materials and methods

2.1 Data preparation

In the segmentation of liver hemangiomas, the imaging differences between venous phase and non-contrast phase scans are crucial for accurate detection and analysis (18). Given the characteristic variations in hemangioma presentation across different imaging phases, precise differentiation and segmentation of these vascular lesions hold significant importance for subsequent diagnosis and treatment. Our research therefore focuses on feature transfer and automated segmentation tasks between venous phase and non-contrast phase imaging to enhance clinical diagnostic accuracy and efficiency. This study utilizes the private FDLT liver hemangioma dataset (approved by the Institutional Review Board, IRB No. 2024SC1015-1), comprising non-contrast and venous phase (contrast-enhanced) CT scans from 654 patients. All tumor regions were independently annotated by nine radiologists using ITK-SNAP software (version 4.2.0).

The abdominal CT scans were acquired using a SOMATOM Definition AS+ CT scanner (SIEMENS, Germany) at FengDu General Hospital. The imaging parameters were configured as follows: tube voltage = 120 kV; tube current = 52 mA; exposure time = 500 ms; slice thickness = 1.5 mm; reconstruction matrix size = 512 × 512; pixel spacing = 0.75 × 0.75 mm². The scan protocol included a venous phase enhancement (Series Description: Venous Phase 1.5 I30f) with a spiral pitch factor of 0.6 and a total collimation width of 38.4 mm.

Raw CT images were reconstructed using a convolution kernel of I30f, with a field of view (FOV) diameter of 384 mm. Spatial consistency was ensured through alignment with the patient coordinate system (Image Orientation: [1, 0, 0, 0, 1, 0]). To minimize radiation exposure, the volumetric CT dose index (CTDIvol) was optimized to 2.88 mGy, achieving a dose saving of 56.53%.

All CT images were normalized to Hounsfield Units (HU) and clipped to the range [-150, 250] HU to focus on liver tissue. Intensity values were then scaled to [0, 1]. Images were resampled to a uniform voxel spacing of 0.75 × 0.75 × 1.5 mm³ using linear interpolation. During training, data augmentation included random rotation (± 15°), scaling (0.9–1.1), and horizontal flipping (p=0.5).

Patients were included if they had at least one radiologically confirmed liver hemangioma visible in both non-contrast and venous phase scans. Exclusion criteria included: (1) previous hepatic surgery or intervention, (2) severe motion artifacts in CT images, and (3) concurrent malignant liver tumors.

The patient cohort consisted of 342 males (52.3%) and 312 females (47.7%), with an overall average age of 52.3 ± 12.7 years (range: 18–86 years). A total of 812 hemangiomas were manually annotated across all patients. The tumor size distribution, measured by the maximum diameter on the venous phase scans, exhibited considerable variation: diameters ranged from 3.2 mm to 87.5 mm, with a mean diameter of 24.6 ± 18.3 mm. Based on the maximum diameter, tumors were categorized into three groups: Tiny (< 10 mm, n=198, 24.4%), Small ([10, 20) mm, n=245, 30.2%), and Big (≥ 20 mm, n=369, 45.4%). Anatomically, the tumors were distributed across the liver as follows: left lobe (38.2%), right lobe (56.7%), and caudate lobe (5.1%).

The dataset was randomly split at the patient level into training (524 patients, 80%), validation (65 patients, 10%), and test sets (65 patients, 10%). This ensured no data leakage between sets.

All scans were performed with the patient in a head-first supine position (HFS), and reconstructed images were archived in compliance with anonymization protocols, retaining only non-identifiable demographic information. Due to ethical restrictions and patient data privacy concerns, the original datasets used in this study are not publicly available. However, in the interest of research transparency and reproducibility, the complete source code and models have been made publicly accessible at: https://github.com/Jinchengwu318/WLAU-net.

2.2 Related works

2.2.1 Tumor feature migration and frequency-domain analysis

The domain shift between contrast-enhanced and non-contrast CT imaging poses significant challenges for cross-phase lesion analysis. Due to the absence of contrast agents, non-contrast images exhibit blurred tumor boundaries and attenuated texture features, leading to performance degradation when models trained on venous phase data are directly applied to non-contrast images (19). To address this, feature migration methods aim to transfer discriminative features between imaging phases while preserving critical tumor characteristics (20). Early approaches like CycleGAN attempted cross-phase synthesis but often failed to preserve anatomical structures and subtle lesion details (21, 22). More recently, frequency-domain adaptation has emerged as a promising direction. LUCIDA introduced Fourier-domain processing to align low-dose and full-dose CT data, demonstrating the potential of frequency manipulation for domain adaptation (23). However, Fourier transforms (FFT) provide global frequency representations that may not adequately capture multi-scale local structures in medical images (24). In contrast, discrete wavelet transform (DWT) offers localized time-frequency decomposition, simultaneously capturing global anatomical contours and local details such as tumor edge microtextures (15, 25, 26). By adjusting wavelet coefficients in specific sub-bands (LL, LH, HL, HH), DWT enables targeted enhancement of relevant features while maintaining robustness (27, 28). Building upon these insights, we incorporate DWT-based edge enhancement to address the domain gap in liver hemangioma segmentation.

2.2.2 Transformer-based medical image segmentation

Transformers, originally developed for natural language processing (7), have revolutionized computer vision through their powerful self-attention mechanisms. In medical image analysis, Vision Transformer (ViT) demonstrated that global self-attention applied directly to image patches could achieve state-of-the-art performance (29). For segmentation tasks, TransUNet pioneered the integration of Transformers with UNet architectures, establishing a hybrid framework that combines CNN’s local feature extraction with Transformer’s global context modeling (8). Subsequent developments, including Swin-UNet (30), further advanced transformer-based segmentation through hierarchical feature processing. While these methods leverage global attention mechanisms, they typically lack domain-specific adaptations for cross-phase medical imaging. Our work extends this line of research by incorporating Gaussian-based positional attention guided by anatomical priors, specifically designed to address the challenges of non-contrast liver hemangioma segmentation.

2.3 Methods

Given a source domain dataset FDLT (Xs,Vs), where Xs and Vs denote the non-contrast phase images and their corresponding annotated venous phase images, respectively. Each image xXs is represented as RH×W×C, where H×W is the spatial resolution and C is the number of channels. Our objective is to accurately segment non-contrast phase data Xs in the target domain (FDLT dataset) without contrast agent interference, while preserving the original spatial resolution H × W. A straightforward approach involves training a CNN to encode images into feature representations and subsequently decode them back to full resolution (3).

Unlike conventional CNN-based methods, our framework enhances the TransUNet architecture by integrating three key components: (1) wavelet transform for tumor boundary enhancement, (2) cross-phase transfer learning to bridge domain discrepancies, and (3) Gaussian-based reweighting of tumor feature map attention. The detailed implementations of these components are systematically elaborated in Sections 2.3.1, 2.3.2, and 2.3.3, respectively.

Figure 1 illustrates the overall architecture of the proposed WLAU-Net. The model takes paired non-contrast phase Xs and venous phase Vs, CT images of size 512 × 512 as input. Both phases are first processed independently by the WEEM to accentuate the boundary features of liver hemangiomas. Subsequently, the enhanced images are fed into a 4-layer CNNs for feature extraction. Each CNN layer employs a 3 × 3 kernel with a stride of 2. The non-contrast branch produces a feature map Iij, which is then directed to the GPSA. The GPSA learns a spatial attention weight matrix αij that guides the model’s focus toward regions likely to contain liver hemangiomas. Simultaneously, the venous phase branch, after passing through the same 4-layer CNNs, yields a feature map of size 32 × 32. This feature map undergoes a linear projection to transform it into a structure compatible with the Transformer encoder. The projected features are then modulated by the attention weights αij before being input to the Transformer. The Transformer encoder comprises n = 12 layers, each utilizing multi-head self-attention with 8 heads. The output of the Transformer, denoted as z, is reshaped back into a 32 × 32 feature map. This map is progressively upsampled to the original resolution through a decoder pathway. Each upsampling stage uses bilinear interpolation with a 2×2 kernel, doubling the spatial dimensions per stage. The upsampled features are concatenated with the corresponding feature maps from the non-contrast CNNs encoder via skip connections. Each combined feature set is then processed by a convolution block, kernel size 3×3, stride 1, activated by a ReLU function. This decode-and-merge process is repeated four times. Finally, a segmentation head processes the refined high-resolution features to produce the final pixel-wise segmentation mask for liver hemangiomas in the non-contrast CT image.

2.3.1 Wavelet-based edge enhancement module

Figure 2 introduces a Sym4 wavelet decomposition module to process both noncontrast phase (Xs) and venous phase (Vs) CT images from the FDLT dataset through 2D Sym4 wavelet transform, generating four distinct subbands (31). By strategically amplifying the amplitude coefficients in the HL and LH subbands prior to reconstruction, we enhance tumor boundaries without distorting anatomical structures.

Figure 2
CT scan images are compared using a Discrete Wavelet Transform (DWT) and Inverse Discrete Wavelet Transform (IDWT). The left shows raw data transformed by DWT into a 3D graph with colored spikes. The right displays enhanced data, processed by IDWT, visualized similarly. Blue arrows illustrate the transformation process.

Figure 2. Performing wavelet transform, the amplitude of the HL/LH bands is doubled to enhance the tumor boundary features.

The image reconstructed based on sym4 wavelet is defined as in Equation 1:

{WLL(xs),WLH(xs),WHL(xs),WHH(xs)=DWT(xs)WLL*,WLH*,WHL*,WHH*=Udwt(WLL,WLH,WHL,WHH|λ)xs*=IDWT(WLL*,WLH*,WHL*,WHH*)(1)

Where xs denotes an image sample from the source domain Xs, the signal is decomposed via the Discrete Wavelet Transform (DWT) into four frequency subbands: WLL (low-frequency approximation), WLH (horizontal details), WHL (vertical details), and WHH (diagonal details). These subbands are optimized via the update function Udwt(·|λ), where λ is a hyperparameter balancing feature enhancement and noise suppression, and Udwt denotes a domain-adaptive parameterized operator. The refined subbands WLL*,WLH*,WHL*,WHH* are reconstructed into the enhanced signal xs* through the Inverse Discrete Wavelet Transform (IDWT), where xs* is the output signal preserving structural coherence and amplified boundary features, ensuring compatibility with downstream tasks such as segmentation or classification.

2.3.2 Cross-phase transfer learning to bridge domain discrepancies

To effectively bridge the domain gap between contrast-enhanced and non-contrast CT images, we implement a structured pre-train, freeze, and transferstrategy. This approach begins by training the model encoder on venous phase data, where liver hemangiomas exhibit pronounced contrast enhancement and well-defined boundaries, thereby providing a rich and discriminative feature space for learning fundamental tumor characteristics. The choice of the portal venous phase as the source domain is motivated by its status as the clinical gold standard for liver lesion characterization, ensuring that the model acquires robust prior knowledge of hemangioma morphology and texture.

The encoder is optimized on the venous phase data by minimizing the Dice loss function LDice (Equations 2, 3):

LDice=12|YsY^||Ys|+|Y^|(2)
θ*=arg minθ(12vYs(v)Y^(x*;θ)vYs(v)+vY^(x*;θ))(3)

where Ys denotes the ground truth mask for the venous phase, Y^ is the predicted segmentation, and θ* represents the optimized encoder parameters. These parameters are subsequently frozen (θ=θ*) and directly transferred to initialize the model for the non-contrast phase segmentation task.

The decision to freeze the encoder parameters, rather than employing full fine-tuning, serves as a strategic regularization mechanism to prevent overfitting to the limited and low-contrast non-contrast data. By preserving the encoder’s learned representations, we ensure that the model retains the robust, domain-invariant features acquired from the source domain, thereby stabilizing the learning process in the target domain. This approach effectively decouples feature learning from domain adaptation: the frozen encoder provides a consistent, high-level representation of hepatic anatomy and tumor morphology, while the subsequent learnable modules—specifically the Gaussian-based Position-Sensitive Attention (GPSA) and the transformer decoder—are tasked with adapting to the nuances of non-contrast imaging. These modules learn to attend to and reconstruct the subtler tumor boundaries and textures within the stable feature space provided by the encoder, thereby directly mitigating the domain shift caused by the absence of contrast agent. This parameter-sharing strategy not only enhances computational efficiency by eliminating the need to retrain the encoder from scratch but also ensures consistent feature representation across imaging domains, ultimately improving the model’s ability to segment hemangiomas in non-contrast CT scans (32, 33).

2.3.3 Gaussian-based position-sensitive attention with ground truth guidance

The input image x* is processed by the CNN to extract hierarchical features, producing feature map Iij. Leveraging the available tumor annotation masks Ys, a Gaussian weighting mechanism is applied to guide the model’s focus toward the tumor region with anatomical precision (34).

The ground truth-guided Gaussian weighting is defined as (Equations 46):

μYs=1|Ys|(i,j)YsIij(4)
σYs=1|Ys|(i,j)Ys(IijμYs)2+(5)
Gij=exp ((IijμYs)22σYs2+),=105(6)

where Ys represents the ground truth tumor mask, |Ys| denotes the number of pixels within the tumor region, μYs is the mean intensity of tumor pixels, and σYs is the standard deviation characterizing the intensity distribution within the annotated tumor area.

For enhanced spatial sensitivity, a multi-scale Gaussian formulation is employed (Equations 7, 8):

Gij(k)=exp ((IijμYs)22(k·σYs)2+),k{0.5, 1.0, 2.0}(7)
Gij=13kGij(k)(8)

The Gaussian weights are integrated into the transformer architecture through a bias matrix mechanism (Equations 912):

g=Flatten(G)N×1(9)
Bgaussian=g·gTN×N(10)
A=QKTdk+λBgaussian(11)
Output=softmax(A)V(12)

where Q, K, V are query, key, and value matrices derived from the feature map, dk is the key dimension, and λ is a learnable parameter initialized at 0.5.

To ensure the attention mechanism aligns with clinical annotations, a dedicated attention supervision loss is introduced (Equations 13, 14):

Lattention=1Ni=1NBCE(softmax(Ai),Ys,i)(13)
Ltotal=LDice+αLattention(14)

where BCE denotes binary cross-entropy loss, and α controls the relative weight of attention supervision (empirically set to 0.3).

During inference, when ground truth annotations are unavailable, the model utilizes population-level statistics learned during training (Equations 15, 16):

μlearned=1|Dtrain|n=1|Dtrain|μYs(n)(15)
σlearned=1|Dtrain|n=1|Dtrain|σYs(n)(16)

This approach ensures that the Gaussian attention mechanism benefits from precise anatomical guidance during training while maintaining practical applicability during clinical deployment.

This ground truth-guided Gaussian attention mechanism provides strong anatomical priors that significantly enhance the model’s ability to focus on liver hemangiomas, particularly in non-contrast CT where lesion conspicuity is reduced (7).

2.3.4 Hyperparameter tuning

To ensure optimal model performance, we conducted systematic hyperparameter tuning for WLAU-Net. The optimization process involved both manual exploration based on empirical evidence and automated search techniques, with performance evaluated on the validation set (10% of patients). The key hyperparameters and their configurations are summarized below.

2.3.4.1 Training hyperparameters

We utilized the AdamW optimizer with an initial learning rate of 3 × 10−4, which was decayed using a cosine annealing schedule over 200 epochs. The learning rate search space spanned from 10−5 to 10−3. The batch size was set to 8, selected after testing values of 4, 8, and 16, with 8 providing the best trade-off between memory usage and gradient stability. Weight decay was applied with a coefficient of 10−2 to mitigate overfitting. For the transformer component, we used 12 layers with 8 attention heads, a configuration that consistently outperformed alternatives in preliminary experiments.

2.3.4.2 Wavelet enhancement parameters

For the Wavelet-based Edge Enhancement Module (WEEM), we selected the Symlet-4 (sym4) wavelet based on its superior performance in boundary preservation, as evidenced by comparative analysis across multiple wavelet families (see Table 1). The enhancement coefficients for the horizontal λ1 and vertical λ2 detail subbands were optimized through grid search within the range [1.3, 2.2] with a step size of 0.1. The optimal values were determined to be λ1 = 2.0 and λ2 = 2.0, which provided the best trade-off between edge enhancement and noise amplification, as validated by the highest Dice score on the validation set. The low-frequency LL and diagonal HH subbands were left unmodified to preserve anatomical integrity and suppress high-frequency noise, respectively.

Table 1
www.frontiersin.org

Table 1. The best performance of different wavelet functions in liver hemangioma segmentation on the FDLT dataset.

2.3.4.3 GPSA module parameters

In the Gaussian-based Position-Sensitive Attention (GPSA) module, the scaling factor λ for the Gaussian bias matrix was initialized at 0.5 and made learnable during training. The attention supervision weight α in the total loss function was set to 0.3 after experimenting with values from 0.1 to 1.0. Multi-scale Gaussian kernels with standard deviation factors k ∈ {0.5, 1.0, 2.0} were empirically chosen to capture both local and global contextual information.

2.3.4.4 Optimization process

The hyperparameter tuning process followed a staged approach: (1) initial coarse search for learning rate and batch size, (2) systematic evaluation of wavelet functions and enhancement coefficients, and (3) fine-tuning of attention-related parameters. The final configuration was determined based on the model’s performance on the validation set, measured primarily by the Dice Similarity Coefficient. This comprehensive tuning strategy ensured that WLAU-Net achieved robust and reproducible segmentation performance across varying tumor sizes and imaging characteristics.

3 Results

Table 2 shows the results of liver hemangioma segmentation experiments based on the FDLT dataset, demonstrating that WLAU-Net outperforms several key metrics. As shown in Table 2, our method leads all comparison models with a Dice coefficient of 65.37%, an IoU of 58.53%, and an accuracy (ACC) of 96.23%. The Dice score improves by 0.87 percentage points over the next best model, CS-UNet (64.50%), while the IoU and ACC increase by 5.72 and 2.38 percentage points, respectively, compared to Swin-UNet (52.81%) and CS-UNet (93.85%). Although the Hausdorff Distance (HD) of 23.55mm is slightly higher than that of CS-UNet (22.37mm), it still outperforms mainstream architectures such as UNet (30.83mm) and TransUNet (27.62mm). This result indicates that our method effectively balances global anatomical integrity and local detail accuracy in liver hemangioma segmentation.

Table 2
www.frontiersin.org

Table 2. Experimental results of the FDLT dataset. The Dice coefficient (Dice), Intersection over Union (IoU), Accuracy (Acc), and Hausdorff Distance (HD) for liver hemangioma segmentation across compared methods are presented.

Table 3 shows the impact of different module combinations in WLAU-Net on liver hemangioma segmentation performance. With the full configuration (T-T-T), the model achieves the best performance across all metrics (Dice of 65.37, HD of 23.55, IoU of 58.53, and ACC of 96.23). When wavelet transform, Gaussian module, or transfer learning module are sequentially removed, the model performance decreases to varying extents. For example, with the T-TF configuration, the Dice drops to 64.21 and the HD increases to 24.31; with the T-F-T configuration, the HD rises to 25.83 and the ACC drops to 94.88; with the F-T-T configuration, the Dice is 62.47 and the ACC drops to 93.57. In more extreme combinations, such as T-F-F, F-T-F, F-F-T, and F-F-F, all metrics show significant degradation. Among these, the HD is highest at 28.93 in the F-F-T configuration, and the IoU is lowest at 50.68 and ACC lowest at 90.27 in the F-F-F configuration. These results validate the effectiveness of each module in improving the overall performance of the model.

Table 3
www.frontiersin.org

Table 3. Ablation study on module components for liver hemangioma segmentation (FDLT dataset). Performance is compared across different combinations of transfer learning, GPSA, and WEEM modules. Evaluation metrics include Dice, HD, IoU, and ACC.

Figure 3 shows the attention visualization brought by the GPSA module in the ablation experiment. (a) is the attention distribution of the model without the Gaussian module, and (b) is the attention distribution after adding the GPSA module.

Figure 3
Figure 3 consists of four panels displaying CT scan data. Panel 1 shows the raw CT image. Panel 2 highlights a specific region of interest. Panels 3 and 4 are overlaid with color heatmaps: (a) indicates data intensity, while (b) represents the ground truth of tumor location annotated by physicians, where the color gradient from red to blue visually encodes the model’s attention level to different regions, with red indicating higher attention. A color scale on the right defines the intensity range from blue to red.

Figure 3. Attention visualization brought by the GPSA module: (a) shows the effect of disabling the Gaussian module, and (b) shows the effect of enabling the Gaussian module.

Figure 4 shows the segmentation results of three cases, comparing our method with the baseline method TransUNet and other methods. Our method performs excellently in cases 1 and 2, segmenting medium and large liver hemangiomas. However, in case 3, all methods show partial segmentation errors for multiple small liver hemangiomas.

Table 4 shows the significant performance differences of the methods in detecting liver hemangiomas of different sizes. Our method (Ours) demonstrates high accuracy in detecting liver hemangiomas of all sizes, with particularly outstanding performance in detecting small (Small) and large (Big) liver hemangiomas. For Tiny (<10 mm) liver hemangiomas, CS-UNet (54.57%) and our model (Ours, 52.15%) perform the best, significantly outperforming other methods (36). For Small ([10, 20) mm) liver hemangiomas, our method (Ours, 61.45%) achieves the best performance, followed by CS-UNet (61.30%) and TransUNet (60.37%) (8, 36). In the detection of Big (≥20 mm) liver hemangiomas, our method (Ours, 69.22%) also stands out, with Swin-UNet (67.21%) and CS-UNet (67.08%) following closely behind (30, 36). Overall, our method achieves the best detection accuracy in liver hemangiomas of various sizes, especially in Small and Big liver hemangiomas, demonstrating strong precision and robustness. It is suitable for clinical detection of liver hemangiomas and can provide more accurate diagnostic results.

Table 4
www.frontiersin.org

Table 4. Tumor segmentation performance (average Dice) under different tumor sizes reported in the FDLT dataset.

Table 1 shows that different wavelet functions perform differently in liver hemangioma segmentation. Among the Daubechies wavelets (such as db4, db6, db8, etc.), db4 performs the best with a Dice value of 58.76. Among the Symlets functions, the sym4 wavelet performs the best with a Dice value of 65.37. Among the Biorthogonal wavelets (such as bior2.2, bior3.3, etc.), bior3.3 stands out with a Dice value of 61.8. Among the Coiflets wavelets, the coif5 wavelet performs the best with a Dice value of 63.75. Overall, Symlets and Biorthogonal wavelet functions perform better in segmentation accuracy.

Figure 4 shows the Dice effects produced by different inverse transform parameters corresponding to various wavelet functions, where the red-bordered ones represent the best wavelet function and inverse transform subband coefficients.

Figure 4
Three-dimensional bar chart visualizing the effectiveness of various wavelets and coefficients. Axes represent different wavelet types, HL:LH coefficient ratios, and Dice percentages. Color gradient indicates percentage values. A specific bar is highlighted, labeled with 65.37 percent for parameters sym4, lambda one equals 1.7, and lambda two equals 2.1.

Figure 4. Dice values generated by different wavelet functions and different subband coefficients.

Table 5 provides a detailed architectural comparison of the evaluated segmentation methods for liver hemangioma segmentation. As illustrated, WLAU-Net represents the most comprehensive framework, uniquely integrating all six key architectural components: CNNs, Transformer-based self-attention, attention mechanisms, cross-phase transfer learning, wavelet-based frequency-domain processing. In contrast, existing methods typically incorporate only a subset of these design elements. For instance, UNet and DeepLabV3+ rely solely on CNN-based architectures, while Swin-UNet employs a pure Transformer design. Attention-UNet and CS-UNet enhance CNN frameworks with attention mechanisms, and TransUNet combines CNN and Transformer components. However, only WLAU-Net incorporates the specialized modules—cross-phase transfer learning, WEEM, and GPSA—that are specifically designed to address the challenges of liver hemangioma segmentation in non-contrast CT, including domain shift between imaging phases and enhancement of subtle tumor boundaries. This holistic architectural design enables WLAU-Net to achieve superior performance, as demonstrated by its leading segmentation metrics in Table 2.

Table 5
www.frontiersin.org

Table 5. Architectural comparison of liver hemangioma segmentation methods.

4 Discussion

The WLAU-Net framework proposed in this study demonstrates significant advancements in non-contrast CT liver hemangioma segmentation. By innovatively integrating wavelet transform and attention mechanisms, it successfully addresses the inherent challenges of insufficient contrast in non-enhanced imaging. Experimental results reveal that the model outperforms existing mainstream methods with a Dice score of 65.37% and accuracy of 96.23%, validating the synergistic effectiveness of frequency-adaptive enhancement and dynamic attention mechanisms. Notably, the wavelet edge enhancement module significantly improves tumor boundary recognition through selective amplification of high-frequency components, while the Gaussian-weighted local attention mechanism effectively suppresses interference signals, enabling precise capture of subtle tumor features. Compared with conventional contrast-enhanced approaches, this solution achieves superior accuracy while completely avoiding contrast agent risks, offering a safer clinical alternative.

Segmentation Error Analysis and Implications for Real-time Application. To assess the model’s clinical applicability, we conducted a detailed analysis of segmentation errors, particularly focusing on patterns that may impact real-time deployment. As illustrated in Figure 5 (Case 3), the model exhibits a tendency to under-segment clustered small hemangiomas. This error pattern likely stems from the limited receptive field of local attention mechanisms when confronted with multiple small lesions in close proximity, where the boundaries between adjacent tumors become ambiguous in non-contrast imaging. Furthermore, boundary uncertainty persists in low-contrast tumors where intensity gradients are subtle, occasionally leading to either over- or under-estimation of lesion extent. These error patterns have direct implications for real-time clinical application: first, they highlight the need for robust post-processing algorithms to refine boundary delineation and handle lesion multiplicity; second, they underscore the trade-off between processing speed and segmentation accuracy. Real-time intraoperative applications may require further optimization to meet stricter latency constraints. Future iterations could explore adaptive computation pathways that allocate more resources to challenging cases (e.g., clustered small tumors) while streamlining processing for straightforward ones, thereby balancing accuracy with speed.

Figure 5
Comparison grid of liver tumor segmentation in CT scans using different models. Columns represent GT (ground truth), U-Net, DeepLabV3+, Attention-U-Net, Trans U-Net, Swin-U-Net, CS-U-Net, and a custom model. Rows depict three different cases showing varying segmentation accuracy and regions.

Figure 5. Visualization of segmentation results of different methods in the three test cases.

Despite these breakthroughs, several limitations require improvement. First, the single-center data source may constrain model generalizability, necessitating future multi-center validation. Second, computational efficiency needs optimization, particularly regarding increased inference time from combining wavelet transform with TransUNet. Additionally, optimal frequency combinations may require anatomical site-specific adjustments, warranting further investigation. To translate WLAU-Net from a promising prototype into a clinically robust tool, our future work will prioritize two pivotal directions. First, to address the single-center limitation and ensure generalizability, we plan to conduct largescale, multi-center external validation. Collaborating with diverse medical institutions will allow us to test the model on data from varied CT scanners, imaging protocols, and patient demographics, which is essential for establishing its reliability in real-world clinical settings. Second, to tackle the computational overhead introduced by the wavelet and Transformer modules, we will focus on model lightweighting and deployment optimization. Techniques such as neural architecture search, knowledge distillation, or the design of efficient hybrid attention blocks will be explored to significantly reduce parameter count and inference latency without compromising accuracy. This optimization is crucial for practical integration into hospital picture archiving and communication systems (PACS) or potential edge-device deployment. By advancing in these directions, WLAU-Net can evolve into a practical, efficient, and widely applicable solution for contrast-free liver hemangioma assessment, directly addressing the needs of contrast-agent-sensitive patients. Overall, this research pioneers new pathways for contrast-free medical image analysis, with technical approaches extendable to other segmentation tasks like brain tumor delineation.

5 Conclusions

This study innovatively proposes WLAU-Net, a deep learning framework specifically designed for non-contrast CT liver hemangioma segmentation. By organically combining three key technologies - wavelet transform, transfer learning, and attention mechanisms - it effectively resolves core challenges of feature scarcity and boundary ambiguity in non-enhanced medical image analysis. Experimental evidence confirms that the model not only enhances segmentation accuracy but more importantly eliminates contrast-induced allergic risks, providing safe diagnostic solutions for vulnerable patient populations.

The theoretical contribution lies in establishing a frequency-adaptive medical image analysis methodology, while practical significance resides in developing clinically applicable intelligent diagnostic tools. The theoretical contribution of this work lies in establishing a frequency-adaptive analysis methodology for medical images, while its practical significance is embodied in the development of an intelligent diagnostic tool with clear clinical relevance. To bridge the gap between technical achievement and widespread clinical adoption, immediate future efforts will be dedicated to rigorous multi-center validation to confirm the model’s robustness across diverse populations and hardware, and to systematic model lightweighting to enhance its efficiency for real-time clinical use. Furthermore, we will explore the extensibility of the core methodology—integrating wavelet-based enhancement and guided attention—to other challenging contrast-free segmentation tasks, such as the delineation of brain tumors or renal lesions. These steps are crucial for advancing the field of agent free medical imaging and for ultimately providing safer, accessible diagnostic options for all patients.

Data availability statement

Access to the original dataset is restricted to protect patient privacy, in compliance with the ethical approval (Fengdu General Hospital IRB No. 2024SC1015-1). De-identified data are available from the corresponding authors upon reasonable request and with permission from the institutional ethics committee. Requests to access the datasets should be directed to XS, a2FueWlzaGVuZ0AxNjMuY29t.

Author contributions

BZ: Methodology, Writing – original draft, Writing – review & editing. LZ: Writing – review & editing. LP: Methodology, Writing – review & editing. WC: Writing – review & editing. XF: Writing – review & editing. XS: Writing – review & editing. XG: Writing – review & editing.

Funding

The author(s) declared that financial support was not received for this work and/or its publication.

Acknowledgments

We would like to express our gratitude to Professor Zhang, Professor Cao, and Dr. Sun for their guidance in the preparation of this paper.

Conflict of interest

The authors declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Heller N, Sathianathen N, Kalapara A, Walczak E, Moore K, Kaluzniak H, et al. The kits19 challenge data: 300 kidney tumor cases with clinical context, ct semantic segmentations, and surgical outcomes. arxiv preprint 2019. arXiv preprint arXiv. (1904) 1904:00445.

Google Scholar

2. Zhang Q, Xu Y, Zhang J, and Tao D. Leci: Learnable evolutionary category intermediates for unsupervised domain adaptive segmentation. Artif Intell Sci Eng. (2025) 1:37–51.

Google Scholar

3. Ronneberger O, Fischer P, and Brox T. “U-net: Convolutional networks for biomedical image segmentation”. In: International Conference on Medical image computing and computer-assisted intervention (Switzerland: Switzerland: Springer) (2015) 234–41.

Google Scholar

4. Li W, Wang Z, Hu S, Chen C, and Liu M. Functional and structural brain network construction, representation and application, (Switzerland: Frontiers Media SA). (2023).

Google Scholar

5. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, et al. A survey on deep learning in medical image analysis. Med image Anal. (2017) 42:60–88.

Google Scholar

6. Oktay O, Schlemper J, Folgoc LL, Lee M, Heinrich M, Misawa K, et al. Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv. (2018) 1804:03999.

Google Scholar

7. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Adv Neural Inf Process Syst. (2017) 30.

Google Scholar

8. Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, et al. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv. (2021) 2102:04306.

Google Scholar

9. Li W, Zhang L, Qiao L, and Shen D. Toward a better estimation of functional brain network for mild cognitive impairment identification: a transfer learning view. IEEE J Biomed Health Inf. (2019) 24:1160–8.

PubMed Abstract | Google Scholar

10. Wang X, Girshick R, Gupta A, and He K. “Non-local neural networks”. In: Proceedings of the IEEE conference on computer vision and pattern recognition (Los Alamitos, CA, USA: IEEE) (2018) p. 7794–803.

Google Scholar

11. Ruan Y, Ma H, Ma D, Li W, and Wang X. Low-light image enhancement using dual cross attention. Eng Appl Artif Intell. (2025) 159:111501.

Google Scholar

12. Li W, Qiao L, Zhang L, Wang Z, and Shen D. Functional brain network estimation with time series self-scrubbing. IEEE J Biomed Health Inf. (2019) 23:2494–504.

PubMed Abstract | Google Scholar

13. Li W-K, Chen Y-C, Xu X-W, Wang X, and Gao X. Human-guided functional connectivity network estimation for chronic tinnitus identification: a modularity view. IEEE J Biomed Health Inf. (2022) 26:4849–58.

PubMed Abstract | Google Scholar

14. Li W, Tang Y, Peng L, Wang Z, Hu S, and Gao X. The reconfiguration pattern of individual brain metabolic connectome for parkinson’s disease identification. MedComm. (2023) 4:e305.

PubMed Abstract | Google Scholar

15. Brown IJ. A wavelet tour of signal processing: the sparse way. Investigacion Operacional. (2009) 30:85–7.

Google Scholar

16. Liu R, Wu Z, Yu S, and Lin S. The emergence of objectness: Learning zero-shot segmentation from videos. Adv Neural Inf Process Syst. (2021) 34:13137–52.

Google Scholar

17. Cheng B, Misra I, Schwing AG, Kirillov A, and Girdhar R. Maskedattention mask transformer for universal image segmentation. Proc IEEE/CVF Conf Comput Vision Pattern recognition. (2022), 1290–9.

Google Scholar

18. Li W, Wei H, Wu Y, Yang J, Ruan Y, Li Y, et al. Tide: Testtime few-shot object detection. IEEE Trans Systems Man Cybernetics: Syst. (2024) 54:6500–9.

Google Scholar

19. Yan K, Wang X, Lu L, and Summers RM. Deeplesion: Automated deep mining, categorization and detection of significant radiology image findings using large-scale clinical lesion annotations. arXiv preprint arXiv. (2017) 1710:01766.

Google Scholar

20. Li W, Xu X, Wang Z, Peng L, Wang P, and Gao X. Multiple connection pattern combination from single-mode data for mild cognitive impairment identification. Front Cell Dev Biol. (2021) 9:782727.

PubMed Abstract | Google Scholar

21. Zhu J-Y, Park T, Isola P, and Efros AA. “Unpaired image-to-image translation using cycle-consistent adversarial networks”. In: Proceedings of the IEEE international conference on computer vision, (Los Alamitos, CA, USA: IEEE) (2017) p. 2223–32.

Google Scholar

22. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial networks. Commun ACM. (2020) 63:139–44.

Google Scholar

23. Chen Y, Meng X, Wang Y, Zeng S, Liu X, and Xie Z. “Lucida: Low-dose universal-tissue ct image domain adaptation for medical segmentation”. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, (Switzerland: Springer) (2024) p. 393–402.

Google Scholar

24. Hatamizadeh A, Tang Y, Nath V, Yang D, Myronenko A, Landman B, et al. Unetr: Transformers for 3d medical image segmentation. Proc IEEE/CVF winter Conf Appl Comput Vision. (2022), 574–84.

Google Scholar

25. Shi J, Li Z, Ying S, Wang C, Liu Q, Zhang Q, et al. Mr image super-resolution via wide residual networks with fixed skip connection. IEEE J Biomed Health Inf. (2018) 23:1129–1140.

PubMed Abstract | Google Scholar

26. Zhang Y, Liu H, and Hu Q. “Transfuse: Fusing transformers and cnns for medical image segmentation”. In: International conference on medical image computing and computer-assisted intervention, (Switzerland: Springer) (2021) p. 14–24.

Google Scholar

27. Ding Y, Zhang T, Cao W, Zhang L, and Xu X. A multi-frequency approach of the altered functional connectome for autism spectrum disorder identification. Cereb Cortex. (2024) 34:bhae341.

PubMed Abstract | Google Scholar

28. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, et al. Swin transformer: Hierarchical vision transformer using shifted windows. Proc IEEE/CVF Int Conf Comput Vision. (2021), 10012–22.

Google Scholar

29. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv. (2020) 2010:11929.

Google Scholar

30. Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, et al. Swin-unet: Unet-like pure transformer for medical image segmentation. Eur Conf Comput Vision. (2022), 205–18.

Google Scholar

31. Daubechies I. Ten Lectures on Wavelets. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics (1992).

Google Scholar

32. Raghu M, Zhang C, Kleinberg J, and Bengio S. Transfusion: Understanding transfer learning for medical imaging. Adv Neural Inf Process Syst. (2019) 32.

Google Scholar

33. Tajbakhsh N, Jeyaseelan L, Li Q, Chiang JN, Wu Z, and Ding X. Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation. Med image Anal. (2020) 63:101693.

PubMed Abstract | Google Scholar

34. Roy AG, Navab N, and Wachinger C. “Concurrent spatial and channel ‘squeeze & excitation’in fully convolutional networks”. In: International conference on medical image computing and computer-assisted intervention, (Switzerland: Springer) (2018) p. 421–9.

Google Scholar

35. Peng H, Xue C, Shao Y, Chen K, Xiong J, Xie Z, et al. Semantic segmentation of litchi branches using deeplabv3+ model. IEEE Access. (2020) 8:164546–55.

Google Scholar

36. Alrfou K, Zhao T, and Kordijazi A. Cs-unet: a generalizable and flexible segmentation algorithm. Multimedia Tools Appl. (2025) 84:7807–34.

Google Scholar

Keywords: CT images, liver hemangioma segmentation, local attention mechanism, patient safety, transfer learning, wavelet transformation

Citation: Zeng B, Zhang L, Peng L, Cao W, Fan X, Sun X and Gao X (2026) Wavelet-enhanced boundary adaptation network for liver hemangioma segmentation in non-contrast CT. Front. Oncol. 15:1725514. doi: 10.3389/fonc.2025.1725514

Received: 15 October 2025; Accepted: 26 December 2025; Revised: 25 December 2025;
Published: 27 January 2026.

Edited by:

Arif Engin Cetin, Dokuz Eylul University, Türkiye

Reviewed by:

Qinjun Qin, Changchun University of Science and Technology (CUST), China
Bindu Madhavi Tummala, Velagapudi Ramakrishna Siddhartha Engineering College, India

Copyright © 2026 Zeng, Zhang, Peng, Cao, Fan, Sun and Gao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Xinfeng Sun, a2FueWlzaGVuZ0AxNjMuY29t; Xin Gao, Z2FveGluQHV2Y2xpbmljLmNu

These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.