MDWC-Net: a multi-scale dynamic-weighting context network for precise spinal X-ray segmentation

Gu, Zhongzheng; Wang, Xuan; Chen, Baojun

doi:10.3389/fphys.2025.1651296

ORIGINAL RESEARCH article

Front. Physiol., 29 August 2025

Sec. Computational Physiology and Medicine

Volume 16 - 2025 | https://doi.org/10.3389/fphys.2025.1651296

MDWC-Net: a multi-scale dynamic-weighting context network for precise spinal X-ray segmentation

Zhongzheng Gu¹

Xuan Wang²

Baojun Chen¹*

¹Department of Spine and Spinal Cord Surgery, Henan Provincial People’s Hospital, Zhengzhou, Henan, China
²Department of Medical Imaging, The Third Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China

Purpose: Spinal X-ray image segmentation faces several challenges, such as complex anatomical structures, large variations in scale, and blurry or low-contrast boundaries between vertebrae and surrounding tissues. These factors make it difficult for traditional models to achieve accurate and robust segmentation. To address these issues, this study proposes MDWC-Net, a novel deep learning framework designed to improve the accuracy and efficiency of spinal structure identification in clinical settings.

Methods: MDWC-Net adopts an encoder–decoder architecture and introduces three modules—MSCAW, DFCB, and BIEB—to address key challenges in spinal X-ray image segmentation. The network is trained and evaluated on the Spine Dataset, which contains 280 X-ray images provided by Henan Provincial People’s Hospital and is randomly divided into training, validation, and test sets with a 7:1:2 ratio. In addition, to evaluate the model’s generalizability, further validation was conducted on the Chest X-ray dataset for lung field segmentation and the ISIC2016 dataset for melanoma boundary delineation.

Results: MDWC-Net outperformed other mainstream models overall. On the Spine Dataset, it achieved a Dice score of 89.86% ± 0.356, MIoU of 90.53% ± 0.315, GPA of 96.82% ± 0.289, and Sensitivity of 96.77% ± 0.212. A series of ablation experiments further confirmed the effectiveness of the MSCAW, DFCB, and BIEB modules.

Conclusion: MDWC-Net delivers accurate and efficient segmentation of spinal structures, showing strong potential for integration into clinical workflows. Its high performance and generalizability suggest broad applicability to other medical image segmentation tasks.

1 Introduction

The spine, as the structural support for the body and its organs, can develop deformities, cause back pain, or even lead to paralysis when affected by disease (Khalifeh et al., 2024). Accurate spinal segmentation plays a crucial role in the diagnosis and treatment of spinal disorders. It not only enables clinicians to more precisely locate and identify spinal structures, but also provides the foundation for measuring key spinal parameters, spinal registration, and scoliosis classification (Liebmann et al., 2024; Azampour et al., 2024; Sarwahi et al., 2021; Thibodeau-Antonacci et al., 2025; Kim et al., 2022). Such technology is essential for evaluating disease severity, monitoring progression, and planning surgical interventions. Among the various imaging modalities, X-ray technology has become a commonly used clinical tool for spinal disease diagnosis due to its advantages of low radiation exposure, rapid imaging, and cost-effectiveness (Zhang et al., 2020). Consequently, developing automatic segmentation algorithms tailored for spinal X-ray images holds substantial clinical value. In recent years, deep learning—an emerging branch of artificial intelligence—has achieved remarkable progress in image classification, semantic segmentation, and object detection, by learning high-level representations from data (Duan et al., 2025; J. Chen et al., 2025; Zhou et al., 2023; Gui et al., 2024). These advancements offer innovative solutions for accurate segmentation in spinal medical imaging. However, current deep learning-based methods for spinal X-ray image segmentation still face several limitations. First, spinal structures often exhibit complex multi-scale characteristics, and existing methods struggle to capture features at different scales effectively, resulting in suboptimal segmentation performance and loss of fine details. Additionally, many networks suffer from inadequate feature fusion mechanisms, which leads to redundancy and information loss. To address the above challenges, we propose a novel deep learning framework, the Multi-Scale Dynamic-Weighting Context Network (MDWC-Net), for spinal X-ray image segmentation. The network is designed to enhance the extraction of anatomical details across multiple scales, improve the fusion between low- and high-level features, and strengthen the modeling of global contextual information. The main contributions of this work are as follows:

1. A specialized segmentation framework tailored for spinal X-ray images is developed, aiming to provide a reliable and efficient tool to support automatic diagnosis and quantitative analysis in clinical settings.

2. Effective modules are designed to improve multi-scale representation, contextual awareness, and feature interaction within the encoder–decoder architecture.

3. Extensive experiments on spinal and cross-modality datasets demonstrate the superior performance, efficiency, and generalization ability of our proposed method.

2 Materials and methods

2.1 Application of deep learning techniques in medical image segmentation

In recent years, deep learning techniques have made significant progress in various fields. Deep learning automatically extracts features from data through multi-layer neural networks, eliminating the complex process of traditional feature engineering (Talaei Khoei et al., 2023). Medical image segmentation, as one of the key tasks in medical image processing, aims to separate the region of interest from the background in images, helping clinicians with disease diagnosis and treatment. Fully Convolutional Networks (FCNs) are among the earliest deep learning models to achieve significant progress in medical image segmentation tasks (Wang et al., 2022). By replacing the fully connected layers in traditional convolutional neural networks with convolutional layers, FCNs are capable of performing pixel-level classification on input images of any size. FCN-8s is a variant of FCN that fuses feature information from different layers to enhance segmentation accuracy. The DeepLab series is another classic segmentation network model, including DeepLabV1, DeepLabV2, and DeepLabV3 (Yang et al., 2024; Jeong et al., 2024; L. C. Chen et al., 2018). DeepLabV1 extends the receptive field of convolutions through dilated convolutions, effectively improving the segmentation capability for medical images with complex backgrounds or unclear edges. DeepLabV2 builds upon this by incorporating Conditional Random Fields (CRFs) for post-processing, refining the segmentation results. DeepLabV3 further improves the dilated convolution and combines it with an encoder-decoder architecture, enabling it to handle more complex medical image segmentation tasks.

Compared to the classic models mentioned above, U-Net, which is currently the most widely used model in the field of medical image segmentation, was first introduced in 2015 (Falk et al., 2019). It adopts an encoder-decoder architecture and combines low-level features with high-level features through skip connections, preserving the spatial information of the image. Due to its superior segmentation performance, U-Net has also been introduced into industrial fields such as defect detection (Xia et al., 2023; Tulbure et al., 2022) and remote sensing image segmentation (Bai et al., 2023; J. Li et al., 2022). With the deepening of research, more and more scholars have proposed various improvements to address the shortcomings of U-Net, resulting in numerous variant models (Das and Das, 2024; Zhou et al., 2024; Tang et al., 2024; Jisna et al., 2024; Chen et al., 2024; Li et al., 2023) for different segmentation tasks. These network models are widely applied to segmentation tasks in medical images such as those of the heart, liver, blood vessels, and cells (Carneiro et al., 2012; Khan et al., 2022; Gegundez-Arias et al., 2021; Greenwald et al., 2022; Le, 2023). Zhao et al. (2021) introduced a multi-scale up-sampling attention block to enhance feature representation and adopted a nested skip-connection pyramid architecture for feature extraction, applying it to the retinal vessel segmentation task. Li et al. (2023) integrated an attention context encoding module and dual segmentation branches, improving liver segmentation accuracy while keeping the parameter count reasonable. Zhu et al. (2023) used the Swin Transformer framework to extract semantic features and introduced a shift-block labeling strategy during training to achieve more precise brain tumor segmentation. Zhao et al. (2022) focused on two main aspects—sequence encoding and variational information bottlenecks—and proposed an improved model based on different deep learning architectures for peptide toxicity prediction. Although transformer-based models like Swin Transformer have achieved success in brain tumor and peptide segmentation, their application to spinal X-ray segmentation remains limited due to the modality’s lower contrast and structural complexity.

2.2 Application of deep learning techniques in spinal image segmentation tasks

The segmentation of spinal images aims to assist doctors in better understanding the patient’s condition. H. Li et al. (2021) improved model accuracy by embedding a dual-branch multi-scale attention module. This method achieves the segmentation of vertebrae, laminae, and the dural sac from lumbar MRI images, thereby providing assistance in diagnosing lumbar spinal stenosis. Shi et al. (2022) designed a novel dual-path network based on an attention gate (AGNet). This model consists of a context path and an edge path, aiming to extract semantic and boundary information from the spinal and vertebral regions. A multi-scale supervision mechanism is employed to explore comprehensive features, and an edge-aware fusion mechanism is used to combine the features extracted from both paths, enhancing segmentation performance. Chen et al. (2024) combined U-Net and Mask R-CNN to achieve automatic segmentation and labeling of vertebrae in lateral cervical and lumbar X-ray images, with accuracy improved through rule-based strategies. Deng et al. (2024) proposed a complementary network that integrates the advantages of U-Net and BiseNet for spinal segmentation in MRI images. The network uses strip pooling (SP) blocks to replace the spatial extraction path in the BiseNet framework and employs an attention refinement module to fuse the extracted features, thereby improving segmentation accuracy. While multi-scale and attention-based methods have shown success in MRI and CT segmentation, their direct application to spinal X-rays is limited due to lower soft-tissue contrast, overlapping anatomical structures, and higher noise. Although a few studies have begun exploring such strategies in X-ray contexts, their effectiveness remains constrained. To address the challenges in spinal X-ray image segmentation, we propose MDWC-Net, which integrates task-specific feature weighting and structure-aware fusion strategies. The model enhances both local detail capture and global contextual understanding.

2.3 Overall network architecture

The task of spinal image segmentation often faces numerous challenges, including the complexity of spinal structures, noise interference in images, and the inability to effectively fuse features of different scales. Due to the diverse presentation of the spine in X-ray images and the presence of similar backgrounds, the segmentation process struggles to accurately capture details and boundaries. Furthermore, the lack of effective utilization of features at different scales can lead to a decrease in segmentation accuracy.

To address these issues, Multi-Scale Dynamic-Weighting Context Network (MDWC-Net) is proposed. As shown in Figure 1, MDWC-Net mainly consists of four parts: the encoder structure, decoder structure, skip connections, and bottleneck structure. MDWC-Net utilizes a multi-scale convolutional adaptive weighting block to perform feature extraction and target reconstruction. By jointly learning different channels of multi-scale feature maps, it dynamically adjusts the importance of different regions. The developed dual feature complementarity block enables effective fusion of high-level semantic information from the encoder structure and low-level spatial information from the decoder structure, enhancing the network’s ability to capture spinal detail information. Furthermore, a bottleneck information enhancement block is designed at the bottleneck layer of the network, allowing the network to more fully capture and utilize global contextual information, thereby strengthening the representation of key information.

Figure 1

Flowchart of a neural network architecture for medical image processing. It starts with an X-ray image and progresses through layers labeled by numbers, with processes like 3x3 convolution, max pooling, and dual feature complementary blocks. The output is a segmented structure. A legend explains symbols and processes: arrows indicate operations such as upsampling and multi-scale convolution.

Figure 1. The overall network structure of MDWC-Net.

2.4 Multi-scale convolution adaptive weighting block

Figure 2 shows the structure of the multi-scale convolutional adaptive weighting block. Different scales of convolutional kernels are selected to extract multi-scale information from the input image, addressing the issue of insufficient multi-scale detail extraction capability in spinal image segmentation tasks. At the same time, an adaptive weighting block is constructed to dynamically adjust channel weights based on the feature information of different input images. Through the construction of multi-scale depthwise separable convolutions and the adaptive weighting block, the model is enabled to efficiently and thoroughly extract multi-scale features from the image while dynamically adjusting the weights of different features based on the input. This enables more accurate segmentation of the regions of interest in the spine.

Figure 2

Flowchart illustrating a neural network module. It begins with an input feature map, branching into four convolutional layers with different kernel sizes: 1x1, 3x3, 5x5, and 7x7. Each layer applies ReLU activation, followed by adaptive weighting. Outputs are concatenated into a final output feature map. The input size is $C \times H \times W$, and the parameters are calculated as $C^2 + 91C + 3C/4$.

Figure 2. The structure of the multi scale convolution adaptive weighting block.

Among them, the multi-scale convolution operation is the core method for achieving multi-scale feature extraction. To capture information at different spatial scales, we designed multiple convolution kernels of different sizes (1 × 1, 3 × 3, 5 × 5, 7 × 7) and performed the computations using depthwise separable convolutions. Unlike the fully connected convolutions in traditional convolutional neural networks, depthwise separable convolutions break down the convolution operation into two steps: first, performing a channel-wise convolution on each individual channel, and then performing a pointwise convolution across channels. This strategy not only effectively reduces computational complexity but also enables the simultaneous capture of small-scale features at the fine detail level and large-scale features at the global level. It provides a rich feature foundation for subsequent feature fusion and dynamic weighting.

To further optimize the utilization of features, this study designs an adaptive weighting block based on the extraction of multi-scale features. The block contains learnable weight parameters, which are optimized during the training process according to the specific feature requirements of the image. By dynamically adjusting the weights of features at different scales, MDWC-Net can more flexibly adapt to different input features, thus achieving higher accuracy in the spine region of interest segmentation task. The weighting process is shown in Figure 3, where the feature extraction with convolution kernels of sizes 1 × 1, 3 × 3, 5 × 5, and 7 × 7 corresponds to the dynamic weighted feature vectors M, N, R, and Q, respectively. Using a parallel approach, M, N, R, and Q are merged to form the multi-scale convolution adaptive weighting block.

Figure 3

Matrix multiplication illustration. Left side shows a matrix with elements M11, M12, etc. Right side depicts multiplication with weights W1, W2, etc., resulting in a new matrix with elements like W1M11. An arrow indicates transformation. Text below states:

Figure 3. The adaptive weighted feature map.

For the adaptive weighting layer shown in Figure 3, the weight $W_{m}$ in the parameter update is calculated as follows during the gradient computation (Equation 1):

\nabla_{W n} L (W_{n}) = \frac{\partial L}{\partial W_{n}} = \frac{\partial L}{\partial Y_{n}} \frac{\partial Y_{n}}{\partial W_{n}} = \sum_{j} X_{n j} \frac{\partial L}{\partial Y_{n}} (1)

Where L represents the loss function, and $Y_{n}$ denotes the output result after feature weighting of the n-th row.

The weight update can be expressed as (Equation 2):

W_{n}^{l + 1} = W_{n}^{l} - η \nabla W_{n}^{l} L (W_{n}^{l}) (2)

Where l represents the number of parameter updates, $η$ represents the learning rate, and $W_{n}^{l}$ denotes the parameter values at the $W_{n}$ iteration.

The entire module computation process is as follows (Equations 3–6):

F_{1} = R e l u [k_{1} (X)]; F_{2} = R e l u [k_{3} (X)] (3)

F_{3} = R e l u [k_{5} (X)]; F_{4} = R e l u [k_{7} (X)] (4)

{\tilde{F}}_{1} = δ (F_{1}); {\tilde{F}}_{2} = δ (F_{2}); {\tilde{F}}_{3} = δ (F_{3}); {\tilde{F}}_{4} = δ (F_{4}) (5)

F_{o u t} = C o n c a t ({\tilde{F}}_{1}, {\tilde{F}}_{2}, {\tilde{F}}_{3}, {\tilde{F}}_{4}) (6)

Where $k_{i} (\cdot)$ represents the convolution operation with a filter size of $i \times i$ , $δ$ represents the adaptive weighting layer, and $C o n c a t (\cdot)$ represents the concatenation along the channel dimension.

2.5 Dual feature complementary block

The traditional U-Net uses skip connections that directly concatenate the feature maps from the encoder and decoder parts to recover lost spatial details during the decoding process. However, this approach often leads to insufficient information fusion, especially when it comes to recovering fine details. To address this issue, an innovative dual feature complementary block has been designed, which optimizes the traditional skip connection method through a series of processing steps. This enables the network to more effectively complement and fuse the feature maps from the encoder and decoder. The structure of the dual feature complementary block is shown in Figure 4.

Figure 4

Diagram of a neural network architecture. It shows two separate paths. The top path begins with input $F_n$ of size $C \times H \times W$, going through a $1 \times 1$ convolution layer, batch normalization, and ReLU activation, leading to an output. The bottom path begins with input $F_i$ of size $2C \times H \times W$, undergoing a $1 \times 1$ convolution and sigmoid activation, followed by a multiplication operation. The paths merge before reaching the final output $F_{out}$ of size $C \times H \times W$. Input size and parameters are explained below the diagram.

Figure 4. The structure of the dual-feature complementary block.

The core idea of the dual feature complementary block is to enhance the interaction and information transfer between feature maps by progressively optimizing the feature map fusion process. Specifically, the dual feature complementary block independently processes each feature map from the skip connections, including operations such as convolution for dimensionality reduction, batch normalization, and nonlinear activation. Then, pixel-wise multiplication is applied to enhance the mutual influence between feature maps. Finally, the results of both feature maps are fused and concatenated to form a more refined feature map representation. When processing the encoder feature map, the first step is to reduce the number of channels of feature map $F_{n}$ from C to C//2 using a 1 × 1 convolution. Next, batch normalization is introduced to stabilize the training process, preventing issues like gradient vanishing or explosion. The Relu activation function is then applied to introduce nonlinearity, enhancing the model’s expressive power, resulting in the feature map $F_{n - 1}$ . The specific computation process is as follows (Equation 7):

F_{n - 1} = Re l u [B N (C o n v_{1 \times 1} (F_{1}))] (7)

Where M, N, and P represent the 1 × 1 convolution operation, the Relu activation function, and the exponential function, respectively.

When processing the decoder feature map $F_{i}$ , the operation is similar to that of the encoder feature map. The feature map is first reduced in dimensionality using a 1 × 1 convolution, decreasing the number of channels from 2C to C//2. It is worth noting that in the processing of the decoder feature map, the Sigmoid activation function is used instead of Relu. This design choice allows for smoother interaction between feature maps, avoiding information distortion caused by excessive activation. This is particularly effective in recovering fine details. The specific computation process is as follows (Equations 8 and 9):

F_{i - 1} = C o n v_{1 \times 1} (F_{i}) (8)

F_{i - 2} = S o f t m a x (F_{i - 1}) = \frac{\exp (F_{i - 1})}{\sum_{m = 1}^{c} \exp (F_{i - 1})} (9)

Where $C o n v$ and $\exp (\cdot)$ represent the 1 × 1 convolution operation and the exponential function, respectively, while m represents the index of each channel in the feature map $F_{i - 1}$ .

After independently processing the feature maps, the dual feature complementary block performs a pixel-wise multiplication operation between the processed encoder feature map $F_{n - 1}$ and the decoder feature map $F_{i - 2}$ , allowing for tighter integration of detail recovery and contextual fusion, resulting in feature map $F_{n - m}$ . At the same time, to enhance the information interaction between the feature maps, a pixel-wise multiplication is performed between feature map $F_{i - 1}$ and $F_{i - 2}$ , resulting in feature map $F_{i - 3}$ . Finally, the feature map $F_{n - m}$ and feature map $F_{i - 3}$ are concatenated along the channel dimension to strengthen the network’s use of multi-level information during the skip connection phase, resulting in feature map $F_{o u t}$ . By organically combining these two types of feature information, the network is able to leverage both low-level and high-level features, while avoiding issues such as information loss and redundancy. The specific computation process is as follows (Equations 10–12):

F_{n - m} = F_{n - 1} \otimes F_{i - 2} (10)

F_{i - 3} = F_{i - 1} \otimes F_{i - 2} (11)

F_{o u t} = C o n c a t [F_{n - m}; F_{i - 3}] (12)

Where $\otimes$ and $C o n c a t (\cdot)$ represent the pixel-wise multiplication and concatenation operations, respectively.

2.6 Bottleneck information enhancement block

Convolutional neural networks often encounter issues such as feature information loss and insufficient global information capture when processing high-dimensional feature maps. To address this problem, this study designs a bottleneck information enhancement block, as shown in Figure 5. This block is constructed using parallel dual branches.

Figure 5

Diagram of a neural network structure including components such as GAP, GMP, and MLP, with flow from input $F_{in}$ to output $F_{out}$. Features undergo convolution, batch normalization, and ReLU activation. Various paths show compression and adaptive weighting processes, with mathematical notes on the parameter number calculation and compression ratio details.

Figure 5. The structure of the bottleneck information enhancement block.

The upper branch first applies global average pooling and global max pooling operations to capture global average information and extract the maximum value from each local region. These operations provide a smooth representation of the entire image’s features and help capture prominent local features. The spatial dimensions of feature map $F_{5}$ are compressed to 1 × 1, retaining the global statistical features of each channel. The resulting features are then fed into a shared multilayer perceptron (MLP), which compresses and expands the input feature map along the channel dimension to extract potential high-level feature representations. Next, the outputs from both average pooling and max pooling are added together, and the Sigmoid activation function is applied to generate the weight coefficients, resulting in a normalized channel weight map $ζ$ . Finally, the original feature map $F_{5}$ is element-wise multiplied by the channel weight map $ζ$ , achieving weighted fusion of the feature maps to enhance their effectiveness during the decoding process. The entire operation of the upper branch is as follows (Equations 13 and 14):

ζ = S i g m o i d [γ [P_{a v g} (F_{5})] \oplus γ [P_{\max} (F_{5})]] (13)

F_{5 - 1} = ζ \otimes F_{5} (14)

Where $P_{a v g}$ and $P_{\max}$ represent the global average pooling and global max pooling operations, respectively. $γ (\cdot)$ denotes the MLP operation, and $\oplus$ and $\otimes$ represent addition and element-wise multiplication, respectively.

The lower branch utilizes two consecutive 3 × 3 convolutional layers to extract feature information. The output fused after the first convolution is fed into the second convolutional layer. Through residual connections, more information is retained during feature propagation, which helps avoid information loss in deep networks and facilitates the stable transmission of information flow. Subsequently, an adaptive weighting mechanism is introduced to enhance the key information in the feature map, resulting in feature map ${\tilde{F}}_{5 - 2}$ . The computation process of the lower branch is as follows (Equations 15–18):

{\tilde{F}}_{5} = Re l u [B N (C o n v_{3 \times 3} (F_{5}))] (15)

{\tilde{F}}_{5 - 1} = {\tilde{F}}_{5} + F_{5} (16)

F_{5 - 2} = Re l u [B N (C o n v_{3 \times 3} ({\tilde{F}}_{5 - 1}))] (17)

{\tilde{F}}_{5 - 2} = δ (F_{5 - 2} + {\tilde{F}}_{5 - 1}) (18)

where $C o n v_{3 \times 3} (\cdot)$ and $B N (\cdot)$ represent 3 × 3 convolution operation and batch normalization respectively, and $δ$ represents adaptive weighting layer.

Finally, the output results from the upper and lower branches are fused to obtain the final output feature map $F_{o u t p u t}$ . This process integrates multi-level feature information and the weighting mechanism, effectively enhancing the network’s expressive capability (Equation 19).

F_{o u t p u t} = F_{5 - 1} \oplus {\tilde{F}}_{5 - 2} (19)

Where, $\oplus$ represents the pixel-wise addition operation.

3 Experiment and results

3.1 Dataset

The spine X-ray dataset used in this study was provided by the Department of Spine Surgery at Henan Provincial People’s Hospital, consisting of 280 PNG-format images. Prior to the study, all images underwent de-identification processing to protect patient privacy and were named the “Spine Dataset.” The spine regions in the images were precisely annotated by two spine surgeons using the Labelme annotation tool. To ensure annotation consistency, two spine surgeons jointly defined annotation standards and performed quality cross-checks on randomly selected samples, confirming a high level of agreement. The annotated dataset was randomly divided into training, validation, and test sets in a 7:1:2 ratio. In addition, to validate the generalization ability of MDWC-Net across different segmentation tasks, experiments were conducted using the ISIC 2016 (Gutman et al., 2016) dataset for skin lesion segmentation and the Chest X-ray dataset (Jaeger et al., 2014; Candemir et al., 2014) for lung field segmentation. Both datasets were randomly split into training, validation, and test sets using a 7:1:2 ratio. All images were uniformly resized to 256 × 256 pixels according to a proportional scaling principle.

3.2 Experiment setup and evaluation metrics

3.2.1 Experiment setup

All models in this study were implemented using the PyTorch deep learning framework and Python 3.7, with computations performed on an NVIDIA RTX 2080Ti GPU. The batch size was set to 8, and the cross-entropy loss function was adopted. Random horizontal flipping was applied as a data augmentation technique to the training dataset. The SGD optimizer was used for model training, with an initial learning rate of 0.001. The total number of training epochs was set to 100, and the learning rate was reduced by a factor of 10 every 20 epochs.

3.2.2 Evaluation metrics

To better assess the network performance, this study employs four commonly used image segmentation evaluation metrics: Global Pixel Accuracy (GPA), Dice Coefficient (Dice), Mean Intersection over Union (MIoU), and Sensitivity. Global Pixel Accuracy measures the proportion of correctly classified pixels overall. The Dice Similarity Coefficient provides a comprehensive evaluation of the overlap between the segmentation results and the ground truth labels. Mean Intersection over Union considers the degree of overlap between the predicted and ground truth labels for each class, while Sensitivity reflects the model’s ability to recognize positive class regions. In addition, this study also reports the number of parameters (Params), floating-point operations (FLOPs), and Training_time for each model to evaluate their computational efficiency. The specific expressions for these metrics are as follows (Equations 20–23):

G l o b a l P i x e l A c c u r a c y = \frac{T P + T N}{T P + F P + F N + T N} (20)

M I o U = \frac{1}{m} \sum_{i = 0}^{m} \frac{T P_{i}}{F N_{i} + F P_{i} + T P_{i}} (21)

D i c e = \frac{2 T P}{2 T P + F P + F N} (22)

S e n s i t i v i t y = \frac{T P}{T P + F N} (23)

Where TP refers to true positives, FP represents false positives, TN denotes true negatives, FN refers to false negatives, and m stands for the total number of different classes.

3.3 Experimental results

In this study, MDWC-Net was compared with other deep learning-based segmentation algorithms, including FCN-8S, DeeplabV3+ (Baban and Chaari, 2023), PSPNet (H. S. Zhao et al., 2017), U-Net, ResU-Net (Tang et al., 2024), Attention U-Net (Falk et al., 2019), TransU-Net (Chen et al., 2024), and PLU-Net (Song et al., 2024), through extensive experiments on the Spine Dataset. The experimental results were thoroughly analyzed, and a series of ablation experiments were conducted on the proposed blocks to validate their effectiveness.

3.3.1 Experimental results on the spine dataset

The performance of MDWC-Net on the Spine Dataset is shown in Table 1. From the experimental results in Table 1, it can be observed that MDWC-Net demonstrates superior performance across multiple evaluation metrics. Specifically, the Dice and MIoU scores of MDWC-Net reach 89.86% and 90.53%, respectively. The Dice score is 7.58, 4.34, and 3.58 percentage points higher than those of FCN-8S, DeeplabV3+, and U-Net, indicating that MDWC-Net exhibits a stronger capability in distinguishing between classes. Furthermore, compared to the second-best model in Table 1, MDWC-Net also demonstrates its superiority, achieving a 1.65% and 2.71% increase in Dice and MIoU, respectively. In terms of two key metrics, GPA and Sensitivity, MDWC-Net also demonstrates remarkable performance, achieving excellent scores of 96.82% and 96.77%, respectively. Moreover, MDWC-Net achieves these results with fewer Params, lower FLOPs, and shorter Training_time. This fully demonstrates the efficiency and accuracy of MDWC-Net in segmentation tasks. The experimental data indicate that MDWC-Net not only identifies target regions more accurately in the spine segmentation task but also delineates the edges of the targets more precisely, effectively reducing instances of missed and incorrect segmentations.

Table 1

Table 1. Quantitative performance of different models on spine dataset (highest score indicated in bold font).

As shown in Figure 6, a comparison of the training loss convergence and Dice scores on the test set for each model is presented. After 100 training epochs, the MDWC-Net model demonstrates a more stable and efficient convergence speed. Moreover, as the number of iterations increases, the efficient convergence of MDWC-Net further proves that effective parameter optimization can be achieved even without pre-training. This achievement is attributed to its network design, which integrates multi-scale and information-complementary feature representations. Figure 7 shows the detailed training and validation curves of MDWC-Net. The training loss decreases steadily from approximately 0.2667 to 0.096, while the validation loss follows a similar trend, stabilizing around 0.112 after 61 epochs. The close alignment between training and validation curves indicates that MDWC-Net achieves good generalization without significant overfitting issues.

Figure 6

A graph and bar chart compare the performance of deep learning models. The graph on the left shows loss versus epochs for models like FCN-8S, DeepLabV3+, PSPNet, U-Net, and more, with varying loss rates over 100 epochs. On the right, a bar chart displays Dice scores for these models, with scores ranging from 82.28 to 89.86. MDWC-Net achieves the highest Dice score of 89.86.

Figure 6. Results of different models on the spine dataset.

Figure 7

Line graph showing training and validation loss over 100 epochs. The red line represents training loss, which decreases and stabilizes around 0.1. The cyan line represents validation loss, fluctuating but generally aligning with training loss trends.

Figure 7. The loss curve of MDWC-Net on the spine dataset.

Additionally, to visually assess the accuracy of spine region segmentation, Figure 8 presents a comparative visualization of the segmentation results from different algorithms on the Spine Dataset. As shown in Figure 8, more accurate segmentation results are achieved by MDWC-Net. Specifically, other algorithms generally exhibit significant loss of the spine region in the segmentation output, particularly at the edges and finer structures of the spine. This issue is primarily attributed to the limitations of these algorithms in capturing spine location information beyond the receptive field and in handling finer details within the images.

Figure 8

Two sets of medical images labeled Image1 and Image2, each displaying a sequence from (a-1) to (j-1) and (a-2) to (j-2). Each set begins with a spinal X-ray followed by segmented spinal column images. Subsequent images highlight various sections with orange rectangles, emphasizing different vertebrae details and alignments. The images are used for spinal analysis or diagnostics.

Figure 8. Comparison of segmentation results. The rectangular boxes highlight significant segmentation differences among models. (a) Original image; (b) Ground Truth (GT); (c–j) Segmentation results of FCN-8s, DeepLabV3+, PSPNet, U-Net, ResU-Net, Attention U-Net, TransU-Net, PLU-Net, and MDWC-Net, respectively, on the Spine dataset.

In contrast, the MDWC-Net algorithm effectively alleviates this problem by integrating complementary high-level semantic features and low-level texture features, resulting in more refined and accurate spine region segmentation. In Figure 8, compared to MDWC-Net, other algorithms demonstrate more pronounced over-segmentation and under-segmentation issues during the segmentation process. Over-segmentation occurs when unnecessary details are incorrectly labeled as part of the spine region, while under-segmentation leads to the omission of key spinal structures. The network’s ability to focus on relevant features and utilize global information is enhanced by MDWC-Net through the innovative introduction of a multi-scale adaptive weighting block, a dual-feature complementary block, and a bottleneck information enhancement block. As a result, MDWC-Net outperforms other algorithms in terms of both segmentation accuracy and completeness, demonstrating exceptional performance in the spine X-ray image segmentation task.

3.3.2 Significance testing of segmentation performance

To verify whether the performance improvement of the proposed MDWC-Net over other models (such as DeeplabV3+, PSPNet, U-Net, Attention U-Net, TransU-Net, and PLU-Net) is statistically significant, we conducted paired t-tests on the Dice and MIoU metrics across these models on the Spine dataset. The significance level was set to α = 0.05. As shown in Table 2, all p-values are much smaller than 0.01, indicating that the improvements of MDWC-Net in both Dice and MIoU are statistically significant. These results further demonstrate the effectiveness and robustness of the proposed method.

Table 2

Table 2. Paired t-test results between MDWC-Net and other models.

3.3.3 Ablation experiment

To further explore the contribution of each block in this study, we conducted a series of ablation experiments using the Spine Dataset. First, the U-Net model was used as the baseline. Then, the multi-scale convolution block with the adaptive weighting mechanism removed was integrated into the baseline, referred to as “+Multi-scale Conv only.” The complete Multi-scale Convolution Adaptive Weighting (MSCAW) block was then integrated into the baseline, named “+MSCAW only.” Next, to verify the effectiveness of different branches in the Bottleneck Information Enhancement Block (BIEB), we tested configurations with “+MSCAW+BIEB (upper)” using only the upper branch of BIEB, and “+MSCAW+BIEB (lower)” using only the lower branch. The “+MSCAW+BIEB (full)” configuration integrates both the MSCAW block and the complete BIEB. Finally, “+MSCAW+DFCB” combines the MSCAW block with the Dual Feature Complementary Block. In addition, to further validate the effectiveness and advantages of the proposed modules, we designed replacement experiments: “w/ASPP” replaces the MSCAW module with the classical ASPP module, and “w/CBAM” replaces the BIEB module with the CBAM module. The relevant experimental results are shown in Table 3.

Table 3

Table 3. Ablation experiments of each component block (highest score indicated in bold font).

From the experimental results in Table 3, it can be observed that after integrating the multi-scale convolution adaptive weighting block, the model’s performance was significantly improved, with Dice and MIoU rising to 88.65% and 88.28%, respectively. This improvement is attributed to the key role of this block in feature weighting and decision-making. The block fuses features from different scales and dynamically adjusts the scale weights of each channel based on the regional characteristics of the input image, enabling the model to more accurately capture multi-scale detail information. In addition, the combination of different branches of the Bottleneck Information Enhancement Block with the multi-scale convolution adaptive weighting block led to improvements in the network’s performance across all evaluation metrics. This enhancement is attributed to the bottleneck block’s ability to extract and utilize global contextual feature information.

Furthermore, the integration of the Dual Feature Complementary Block also contributed to the improvement in network performance. Specifically, the embedding of the DFCB resulted in the model’s GPA, Dice, MIoU, and Sensitivity increasing to 96.65%, 89.55%, 89.68%, and 96.31%, respectively. The results show that this block effectively utilizes high-level features to guide low-level features in selecting key information, reducing the loss of important information and interference from irrelevant data. And the proposed modules also outperformed classical counterparts in segmentation performance, further supporting their design rationality and task-specific effectiveness. Finally, through the integration of all blocks, MDWC-Net achieved optimal segmentation performance, and the experimental results strongly validate the effectiveness and practicality of the designed blocks.

4 Discussion

To further evaluate the generalization capability of the proposed MDWC-Net beyond spinal X-ray segmentation, we conducted additional experiments on two publicly available datasets: Chest X-ray dataset for lung field segmentation and ISIC2016 dataset for skin lesion segmentation. These datasets represent two distinct directions of generalization: Chest X-rays are anatomically and radiologically similar to spinal X-rays, while ISIC2016 features highly heterogeneous textures and modalities.

4.1 Experimental results on the chest X-ray dataset

The Chest X-ray dataset provides pixel-level annotations of lung fields. Chest radiographs share similar grayscale distribution and imaging characteristics with spinal X-rays. As shown in Table 4, it can be observed that MDWC-Net achieved a Dice coefficient of 85.32% and MIoU of 86.09%, surpassing baseline methods including U-Net and DeepLabV3+. These results demonstrate that the proposed architecture generalizes effectively not only within the spinal domain but also to other thoracic structures captured by similar imaging modalities. Such findings highlight the potential of MDWC-Net for broader applications in skeletal and soft-tissue segmentation tasks within the domain of radiography.

Table 4

Table 4. Quantitative performance of each model on the chest X-ray dataset (highest score indicated in bold font, GPA represents Global Pixel Accuracy).

4.2 Experimental results on the ISIC2016 dataset

The ISIC2016 dataset contains various types of skin lesions, including melanoma and basal cell carcinoma, with high-quality pixel-level annotations. Unlike spinal X-rays, skin lesion images exhibit irregular shapes, blurry boundaries, and strong variability in texture and contrast, posing distinct challenges to segmentation models. Applying MDWC-Net to this domain allows us to evaluate its robustness across structurally unrelated medical tasks. As shown in Table 5, MDWC-Net achieved excellent results on the ISIC2016 dataset, with a GPA of 95.96%, Dice coefficient of 87.25%, MIoU of 86.75%, and Sensitivity of 94.61%, outperforming several state-of-the-art models—including a 4.83% and 5.28% improvement in GPA and Sensitivity over FCN-8s, and a 3.28% and 3.33% gain in Dice and MIoU over U-Net. Moreover, compared with the Transformer-based TransUNet, MDWC-Net achieved consistent improvements of 1.58%, 2.12%, and 2.69% in GPA, Dice, and MIoU, respectively.

Table 5

Table 5. Quantitative performance of each model on the ISIC2016 dataset (highest score indicated in bold font, GPA represents Global Pixel Accuracy).

These findings demonstrate that the multi-scale feature modeling and dynamic information fusion mechanisms in MDWC-Net are effective not only for structured anatomical regions like the spine but also for unstructured lesion segmentation tasks. To better visualize model performance, Figure 9 shows that MDWC-Net consistently appears in the top-right region of the GPA–Sensitivity and Dice–MIoU scatter plots, indicating a strong trade-off between accuracy and robustness, and confirming its generalizability across diverse medical image modalities.

Figure 9

Two scatter plots compare different network models. The left plot shows sensitivity versus GPA, with MDWC-Net achieving the highest sensitivity around 95 and GPA around 97. Other models like U-Net and AttentionU-Net have lower sensitivity and GPA values. The right plot shows MIoU versus Dice, again highlighting MDWC-Net with the highest values around 87 and 86, respectively. Other models like PLU-Net and ResU-Net are grouped with lower values.

Figure 9. Distribution of scores for different models.

5 Conclusion

This study proposes MDWC-Net, an efficient deep network designed for spinal X-ray image segmentation. By incorporating multi-scale convolution adaptive weighting, dual feature complementary block, and bottleneck information enhancement block, the model demonstrates outstanding segmentation performance with strong generalization capability and deployment potential. Although this work primarily focuses on improving segmentation accuracy, its high-quality structural boundary extraction also provides a reliable basis for downstream clinical tasks such as spinal parameter measurement and preoperative path planning. In addition, the lightweight design and low computational cost make it suitable for integration into radiology-assisted reading systems or surgical planning platforms. While MDWC-Net demonstrates robust performance across diverse imaging conditions, further optimization could enhance its effectiveness in extremely challenging scenarios such as very low-contrast or heavily degraded X-ray images. Future work will incorporate clinical user feedback for prospective validation, optimize model deployment through techniques such as model pruning and knowledge distillation, and focus on enhancing robustness under challenging imaging conditions to meet diverse clinical requirements.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by Ethics Committee of the Henan Provincial People’s Hospital. The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required from the participants or the participants’ legal guardians/next of kin in accordance with the national legislation and institutional requirements.

Author contributions

ZG: Data curation, Formal Analysis, Validation, Visualization, Writing – original draft, Writing – review and editing. XW: Formal Analysis, Methodology, Visualization, Writing – original draft. BC: Funding acquisition, Resources, Supervision, Validation, Writing – review and editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This project was financially supported by the Henan Provincial Medical Science and Technology Research Plan Provincial-Ministry Joint Project (Grant no. SBGJ202403002).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Azampour M. F., Tirindelli M., Lameski J., Gafencu M., Tagliabue E., Fatemizadeh E., et al. (2024). Anatomy-aware computed tomography-to-ultrasound spine registration. Med. Phys. 51 (3), 2044–2056. doi:10.1002/mp.16731

PubMed Abstract | CrossRef Full Text | Google Scholar

Baban A. E. T. R., Chaari L. (2023). Mid-DeepLabv3+: a novel approach for image semantic segmentation applied to African food dietary assessments. Sensors (Basel) 24 (1), 209. doi:10.3390/s24010209

PubMed Abstract | CrossRef Full Text | Google Scholar

Bai S., Liang J., Long T., Liang C., Zhou J., Ge W., et al. (2023). An efficient approach to detect and track winter flush growth of litchi tree based on UAV remote sensing and semantic segmentation. Front. Plant Sci. 14, 1307492. doi:10.3389/fpls.2023.1307492

PubMed Abstract | CrossRef Full Text | Google Scholar

Candemir S., Jaeger S., Palaniappan K., Musco J. P., Singh R. K., Zhiyun X., et al. (2014). Lung segmentation in chest radiographs using anatomical atlases with nonrigid registration. IEEE Trans. Med. Imaging 33 (2), 577–590. doi:10.1109/tmi.2013.2290491

PubMed Abstract | CrossRef Full Text | Google Scholar

Carneiro G., Nascimento J. C., Freitas A. (2012). The segmentation of the left ventricle of the heart from ultrasound data using deep learning architectures and derivative-based search methods. IEEE Trans. Image Process 21 (3), 968–982. doi:10.1109/tip.2011.2169273

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen L. C., Papandreou G., Kokkinos I., Murphy K., Yuille A. L. (2018). DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40 (4), 834–848. doi:10.1109/tpami.2017.2699184

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen S., Luo C., Liu S., Li H., Liu Y., Zhou H., et al. (2024). LD-UNet: a long-distance perceptual model for segmentation of blurred boundaries in medical images. Comput. Biol. Med. 171, 108120. doi:10.1016/j.compbiomed.2024.108120

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen Y., Mo Y., Readie A., Ligozio G., Mandal I., Jabbar F., et al. (2024). VertXNet: an ensemble method for vertebral body segmentation and identification from cervical and lumbar spinal X-rays. Sci. Rep. 14 (1), 3341. doi:10.1038/s41598-023-49923-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen J., Mei J., Li X., Lu Y., Yu Q., Wei Q., et al. (2024). TransUNet: rethinking the U-Net architecture design for medical image segmentation through the lens of transformers. Med. Image Anal. 97, 103280. doi:10.1016/j.media.2024.103280

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen J., Liu Y., Wei S., Bian Z., Subramanian S., Carass A., et al. (2025). A survey on deep learning in medical image registration: new technologies, uncertainty, evaluation metrics, and beyond. Med. Image Anal. 100, 103385. doi:10.1016/j.media.2024.103385

PubMed Abstract | CrossRef Full Text | Google Scholar

Das N., Das S. (2024). Attention-UNet architectures with pretrained backbones for multi-class cardiac MR image segmentation. Curr. Probl. Cardiol. 49 (1 Pt C), 102129. doi:10.1016/j.cpcardiol.2023.102129

PubMed Abstract | CrossRef Full Text | Google Scholar

Deng Y. J., Gu F., Zeng D. X., Lu J. Y., Liu H. T., Hou Y. L., et al. (2024). An effective U-Net and BiSeNet complementary network for spine segmentation. Biomed. Signal Process. Control 89, 105682. doi:10.1016/j.bspc.2023.105682

CrossRef Full Text | Google Scholar

Duan X., Ma X., Zhu M., Wang L., You D., Deng L., et al. (2025). Deep learning-assisted screening and diagnosis of scoliosis: segmentation of bare-back images via an attention-enhanced convolutional neural network. J. Orthop. Surg. Res. 20 (1), 161. doi:10.1186/s13018-025-05564-y

PubMed Abstract | CrossRef Full Text | Google Scholar

Falk T., Mai D., Bensch R., Çiçek Ö., Abdulkadir A., Marrakchi Y., et al. (2019). U-Net: deep learning for cell counting, detection, and morphometry. Nat. Methods 16 (1), 67–70. doi:10.1038/s41592-018-0261-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Gegundez-Arias M. E., Marin-Santos D., Perez-Borrero I., Vasallo-Vazquez M. J. (2021). A new deep learning method for blood vessel segmentation in retinal images based on convolutional kernels and modified U-Net model. Comput. Methods Programs Biomed. 205, 106081. doi:10.1016/j.cmpb.2021.106081

PubMed Abstract | CrossRef Full Text | Google Scholar

Greenwald N. F., Miller G., Moen E., Kong A., Kagel A., Dougherty T., et al. (2022). Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning. Nat. Biotechnol. 40 (4), 555–565. doi:10.1038/s41587-021-01094-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Gui S. X., Song S., Qin R. J., Tang Y. (2024). Remote sensing object detection in the deep learning Era-A review. Remote Sens. 16 (2), 327. doi:10.3390/rs16020327

CrossRef Full Text | Google Scholar

Gutman D., Codella N. C. F., Celebi E., Helba B., Marchetti M., Mishra N., et al. (2016). Skin lesion analysis toward melanoma detection: a challenge at the international symposium on biomedical imaging (ISBI) 2016, hosted by the international skin imaging collaboration (ISIC).

Google Scholar

Jaeger S., Karargyris A., Candemir S., Folio L., Siegelman J., Callaghan F., et al. (2014). Automatic tuberculosis screening using chest radiographs. IEEE Trans. Med. Imaging 33 (2), 233–245. doi:10.1109/tmi.2013.2284099

PubMed Abstract | CrossRef Full Text | Google Scholar

Jeong S., Kim J., Kim S., Min D. (2024). Revisiting domain-adaptive semantic segmentation via knowledge distillation. IEEE Trans. Image Process 33, 6761–6773. doi:10.1109/tip.2024.3501076

PubMed Abstract | CrossRef Full Text | Google Scholar

Jisna V. A., Ajay A. P., Jayaraj P. B. (2024). Using attention-UNet models to predict protein contact maps. J. Comput. Biol. 31 (7), 691–702. doi:10.1089/cmb.2023.0102

PubMed Abstract | CrossRef Full Text | Google Scholar

Khalifeh K., Brown N. J., Pennington Z., Pham M. H. (2024). Spinal robotics in adult spinal deformity surgery: a systematic review. Neurospine 21 (1), 20–29. doi:10.14245/ns.2347138.569

PubMed Abstract | CrossRef Full Text | Google Scholar

Khan R. A., Luo Y., Wu F. X. (2022). RMS-UNet: residual multi-scale UNet for liver and lesion segmentation. Artif. Intell. Med. 124, 102231. doi:10.1016/j.artmed.2021.102231

PubMed Abstract | CrossRef Full Text | Google Scholar

Kim H. J., Yang J. H., Chang D. G., Lenke L. G., Suh S. W., Nam Y., et al. (2022). Adult spinal deformity: a comprehensive review of current advances and future directions. Asian Spine J. 16 (5), 776–788. doi:10.31616/asj.2022.0376

PubMed Abstract | CrossRef Full Text | Google Scholar

Le N. Q. K. (2023). Predicting emerging drug interactions using GNNs. Nat. Comput. Sci. 3 (12), 1007–1008. doi:10.1038/s43588-023-00555-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Li H., Luo H., Huan W., Shi Z., Yan C., Wang L., et al. (2021). Automatic lumbar spinal MRI image segmentation with a multi-scale attention network. Neural Comput. Appl. 33 (18), 11589–11602. doi:10.1007/s00521-021-05856-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Li J., Wang H., Zhang A., Liu Y. (2022). Semantic segmentation of hyperspectral remote sensing images based on PSE-UNet model. Sensors (Basel) 22 (24), 9678. doi:10.3390/s22249678

PubMed Abstract | CrossRef Full Text | Google Scholar

Li X., Qin X., Huang C., Lu Y., Cheng J., Wang L., et al. (2023). SUnet: a multi-organ segmentation network based on multiple attention. Comput. Biol. Med. 167, 107596. doi:10.1016/j.compbiomed.2023.107596

PubMed Abstract | CrossRef Full Text | Google Scholar

Li Y., Zou B., Dai P., Liao M., Bai H. X., Jiao Z. (2023). AC-E network: attentive context-enhanced network for liver segmentation. IEEE J. Biomed. Health Inf. 27 (8), 4052–4061. doi:10.1109/jbhi.2023.3278079

PubMed Abstract | CrossRef Full Text | Google Scholar

Liebmann F., von Atzigen M., Stütz D., Wolf J., Zingg L., Suter D., et al. (2024). Automatic registration with continuous pose updates for marker-less surgical navigation in spine surgery. Med. Image Anal. 91, 103027. doi:10.1016/j.media.2023.103027

PubMed Abstract | CrossRef Full Text | Google Scholar

Sarwahi V., Galina J., Atlas A., Gecelter R., Hasan S., Amaral T. D., et al. (2021). Scoliosis surgery normalizes cardiac function in adolescent idiopathic scoliosis patients. Spine (Phila Pa 1976) 46 (21), e1161–e1167. doi:10.1097/brs.0000000000004060

PubMed Abstract | CrossRef Full Text | Google Scholar

Shi W., Xu T., Yang H., Xi Y., Du Y., Li J., et al. (2022). Attention gate based dual-pathway network for vertebra segmentation of X-ray spine images. IEEE J. Biomed. Health Inf. 26 (8), 3976–3987. doi:10.1109/jbhi.2022.3158968

PubMed Abstract | CrossRef Full Text | Google Scholar

Song W., Yu H., Wu J. (2024). PLU-Net: extraction of multiscale feature fusion. Med. Phys. 51 (4), 2733–2740. doi:10.1002/mp.16840

PubMed Abstract | CrossRef Full Text | Google Scholar

Talaei Khoei T., Ould Slimane H., Kaabouch N. (2023). Deep learning: systematic review, models, challenges, and research directions. Neural Comput. Appl. 35 (31), 23103–23124. doi:10.1007/s00521-023-08957-4

CrossRef Full Text | Google Scholar

Tang Y., Cao Z., Guo N., Jiang M. (2024). A Siamese Swin-Unet for image change detection. Sci. Rep. 14 (1), 4577. doi:10.1038/s41598-024-54096-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Thibodeau-Antonacci A., Popovic M., Ates O., Hua C. H., Schneider J., Skamene S., et al. (2025). Trade-off of different deep learning-based auto-segmentation approaches for treatment planning of pediatric craniospinal irradiation autocontouring of OARs for pediatric CSI. Med. Phys. 52 (6), 3541–3556. doi:10.1002/mp.17782

PubMed Abstract | CrossRef Full Text | Google Scholar

Tulbure A. A., Tulbure A. A., Dulf E. H. (2022). A review on modern defect detection models using DCNNs - deep convolutional neural networks. J. Adv. Res. 35, 33–48. doi:10.1016/j.jare.2021.03.015

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang S., Liu C., Zhang Y. H. (2022). Fully convolution network architecture for steel-beam crack detection in fast-stitching images. Mech. Syst. Signal Process. 165, 108377. doi:10.1016/j.ymssp.2021.108377

CrossRef Full Text | Google Scholar

Xia Y., Han S. W., Kwon H. J. (2023). Image generation and recognition for railway surface defect detection. Sensors (Basel) 23 (10), 4793. doi:10.3390/s23104793

PubMed Abstract | CrossRef Full Text | Google Scholar

Yang T., Wei J., Xiao Y., Wang S., Tan J., Niu Y., et al. (2024). LT-DeepLab: an improved DeepLabV3+ cross-scale segmentation algorithm for Zanthoxylum bungeanum Maxim leaf-trunk diseases in real-world environments. Front. Plant Sci. 15, 1423238. doi:10.3389/fpls.2024.1423238

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang B., Yu K., Ning Z., Wang K., Dong Y., Liu X., et al. (2020). Deep learning of lumbar spine X-ray for osteopenia and osteoporosis screening: a multicenter retrospective cohort study. Bone 140, 115561. doi:10.1016/j.bone.2020.115561

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhao H. S., Shi J. P., Qi X. J., Wang X. G., Jia J. Y. (2017). “Pyramid scene parsing network,” in Paper presented at the 30th IEEE/CVF conference on computer vision and pattern recognition (CVPR), Honolulu, HI.

Google Scholar

Zhao R. H., Li Q., Wu J. R., You J. N. (2021). A nested U-shape network with multi-scale upsample attention for robust retinal vascular segmentation. Pattern Recognit. 120, 107998. doi:10.1016/j.patcog.2021.107998

CrossRef Full Text | Google Scholar

Zhao Z., Gui J., Yao A., Le N. Q. K., Chua M. C. H. (2022). Improved prediction model of protein and peptide toxicity by integrating channel attention into a convolutional neural network and gated recurrent units. ACS Omega 7 (44), 40569–40577. doi:10.1021/acsomega.2c05881

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhou H., Sun C., Huang H., Fan M., Yang X., Zhou L. (2023). Feature-guided attention network for medical image segmentation. Med. Phys. 50 (8), 4871–4886. doi:10.1002/mp.16253

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhou H., Leung H., Balaji B. (2024). AR-UNet: a deformable image registration network with cyclic training. IEEE/ACM Trans. Comput. Biol. Bioinform 21 (4), 692–700. doi:10.1109/tcbb.2023.3284215

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhu Z. Q., He X. Y., Qi G. Q., Li Y. Y., Cong B. S., Liu Y. (2023). Brain tumor segmentation based on the fusion of deep semantics and edge information in multimodal MRI. Inf. Fusion 91, 376–387. doi:10.1016/j.inffus.2022.10.022

CrossRef Full Text | Google Scholar

Keywords: convolutional neural networks, spinal image segmentation, multi-scale convolutional adaptive weighting, dual feature complementary block, bottleneck information enhancement block

Citation: Gu Z, Wang X and Chen B (2025) MDWC-Net: a multi-scale dynamic-weighting context network for precise spinal X-ray segmentation. Front. Physiol. 16:1651296. doi: 10.3389/fphys.2025.1651296

Received: 21 June 2025; Accepted: 17 August 2025;
Published: 29 August 2025.

Edited by:

Rajesh Kumar Tripathy, Birla Institute of Technology and Science, India

Reviewed by:

Nguyen Quoc Khanh Le, Taipei Medical University, Taiwan
Hongwei Ning, Anhui Science and Technology University, China
Swapnil Singh, MicroStrategy Incorporated, United States

Copyright © 2025 Gu, Wang and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Baojun Chen, enp1NjE1QDEyNi5jb20=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.