MS-DASPNet: Multiple Sclerosis lesion segmentation from brain MRI using dual attention and spatial pyramid pooling with transfer learning

Jain, Shikha; Rajpal, Navin; Soni, Pramod Kumar

doi:10.3389/fncom.2025.1713766

ORIGINAL RESEARCH article

Front. Comput. Neurosci., 29 January 2026

Volume 19 - 2025 | https://doi.org/10.3389/fncom.2025.1713766

This article is part of the Research TopicData Mining in NeuroimagingView all 3 articles

MS-DASPNet: Multiple Sclerosis lesion segmentation from brain MRI using dual attention and spatial pyramid pooling with transfer learning

Shikha Jain¹^*

Navin Rajpal¹

Pramod Kumar Soni²^*

¹University School of Information, Communication & Technology, Guru Gobind Singh Indraprastha University, New Delhi, India
²Department of Computer Applications, Manipal University Jaipur, Jaipur, India

Accurate detection and segmentation of multiple sclerosis (MS) lesions in brain Magnetic Resonance Imaging (MRI) is a challenging task due to their small size, irregular shape, and variability in different imaging modalities. Precise segmentation of MS lesions from brain MRI is vital for early diagnosis, disease progression monitoring, and treatment planning. We introduce MS-DASPNet, a Dual Attention Guided Deep Neural Network specifically designed to address the challenges of MS lesion detection, including small lesion sizes, low contrast, and heterogeneous appearance. MS-DASPNet employs a VGG-16-based encoder, an Atrous Spatial Pyramid Pooling (ASPP) bottleneck for multi-scale context learning, and dual attention modules in each skip connection to simultaneously refine spatial details and enhance channel-wise feature representation. Evaluations on four publicly available datasets, namely ISBI-2015, Mendeley, MICCAI-2016, and MICCAI-2021, demonstrate that MS-DASPNet achieves superior Precision, Dice, Sensitivity, and Jaccard scores compared to state-of-the-art methods. MS-DASPNet attains a Dice score of 0.8736 on the MICCAI-2016 dataset and 0.8706 on the MICCAI-2021 dataset, both outperforming existing segmentation techniques, highlighting its robustness and effectiveness in accurate MS lesion segmentation.

1 Introduction

Multiple sclerosis (MS) is a long-term autoimmune disease that targets the central nervous system (CNS), characterized by damage to the myelin sheath, degeneration of nerve fibers, and inflammation within neural tissues (Compston and Coles, 2008). The disease disrupts communication between the brain and body, causing motor, sensory, and cognitive impairments. MS lesions develop in different brain regions, forming sclerosis that appears in multiple locations, thus giving the disease its name, MS (Filippi et al., 2016). While the exact origin of MS remains unclear, genetic predisposition, environmental factors, infections, and immune system dysfunction are believed to contribute to disease onset (Olsson et al., 2017). Magnetic Resonance Imaging (MRI) is the most effective imaging modality for detecting MS lesions, as it provides detailed visualization of white matter abnormalities across multiple sequences, including T1-weighted, T2-weighted, Fluid-Attenuated Inversion Recovery (FLAIR), and Proton Density (PD) scans (Zivadinov et al., 2008). However, accurately segmenting MS lesions remains a difficult task due to variability in lesion appearance, image artifacts, and the complexity of brain anatomy. Various approaches, including traditional techniques, machine learning, and DL methods, have been investigated for MS lesion segmentation. These approaches are generally classified into supervised, unsupervised, and DL-based categories.

Supervised machine learning (ML) techniques require labeled training data, where models learn to distinguish lesions from healthy tissue. Traditional approaches such as thresholding, region growth, and statistical models (e.g., Gaussian Mixture Models and Bayesian classifiers) were initially applied to segmentation (Sajja et al., 2006; Anbeek et al., 2005). Later, advanced classifiers such as Support Vector Machines (SVMs), the Hidden Markov Model, and the expectation-maximization algorithm demonstrated improved lesion detection (Zhang et al., 2001; Lao et al., 2008). More recently, supervised learning models have leveraged hand-crafted features, including intensity histograms, texture features, and spatial priors, to refine segmentation accuracy (Jain et al., 2020). However, supervised models often require extensive manual annotation, which is labor-intensive and prone to inter-rater variability.

Unsupervised algorithms identify hidden patterns in data to facilitate classification and segmentation tasks. Various unsupervised techniques have been explored for MS lesion segmentation. A feature vector-based approach has been utilized to segment MS lesions from skull-stripped MRI images (Akbarpour et al., 2017). Robust partial-volume tissue segmentation, which integrates intensity-based probabilistic and morphological prior maps, incorporating outlier rejection and filling, has also been proposed (Valverde et al., 2017b). Atlason et al. (2019) applied CNNs for tissue and white matter hyperintensity (WMH) segmentation in brain MRI scans. A Euclidean distance-based clustering method has also been employed to detect MS lesions in MRI images (Cetin et al., 2020). An unsupervised approach has been developed to quantitatively assess MS lesion progression to extract brain tissue distortions from MRI scans (Rachmadi et al., 2020). Furthermore, Seg-JDOT, a domain adaptation-based MS lesion segmentation framework, has demonstrated promising results (Ackaouy et al., 2020). However, unsupervised models generally exhibit lower accuracy and often require human intervention to align with domain-specific knowledge.

DL has significantly advanced the extraction of MS lesions by enabling automated feature learning directly from raw MRI scans. Convolutional neural networks (CNNs), especially the UNet architecture and its derivatives, have demonstrated exceptional performance in segmenting MS lesions (Ronneberger et al., 2015; Isensee et al., 2018). Attention-UNet enhances segmentation precision by focusing on lesion regions (Aslani et al., 2019), while DenseUNet and ResUNet improve feature propagation and gradient flow (Oktay et al., 2018; Jha et al., 2019; Long et al., 2015). Fully Convolutional Networks (FCNs) and encoder-decoder architectures have further refined lesion identification (Milletari et al., 2016; Chen et al., 2017). Furthermore, hybrid models incorporating Atrous Spatial Pyramid Pooling (ASPP) and Vision Transformers (ViTs) have demonstrated superior lesion detection capabilities by capturing long-range dependencies and multi-scale contextual information (He et al., 2023; Dosovitskiy et al., 2021; Azad et al., 2024). A hybrid architecture combining the SWin transformer and UNet models is effectively used for the identification of MS lesions (Nouman et al., 2024). Similarly, Davarani et al. (2025) have designed a hybrid architecture of transformers and autoencoders to segment MS lesions. A detailed comparative analysis of various techniques for identifying MS lesions from brain MRI images is presented in Table 1. The performance of all methods presented in Table 1 is evaluated using the Dice-score metric, which eliminates the issue of class imbalance in MS lesion datasets. Despite their success, DL models face challenges such as data scarcity, high computational costs, and overfitting on small training datasets.

Table 1

Table 1. Overview of some of the well-known techniques of MS lesion segmentation from brain MRI images.

Given the limitations of traditional supervised and unsupervised methods, recent research has focused on designing advanced DL designs to improve MS lesion segmentation. To address the limitations of the existing MS lesion segmentation methods discussed above, an efficient architecture, MS-DASPNet, is designed to extract lesions from brain MRI images by fusing low-level and high-level features through a dual-headed attention mechanism, which incrementally weights each channel to enhance feature representation. The encoder of the proposed model is initialized with VGG-16, which provides a strong hierarchical feature extraction capability due to its deep convolutional layers and pre-trained weights, enabling better generalization and improved feature reuse. Additionally, an ASPP section is integrated into the bottleneck to efficiently model contextual information across multiple spatial scales and improve lesion delineation across different resolutions. By addressing the weight update problem, the network further enhances its feature extraction efficiency. MS-DASPNet is computationally efficient, requiring fewer parameters, while the combination of VGG-16's hierarchical feature learning and ASPP's multi-scale context aggregation leads to substantial improvements in MS lesion delineation performance. The major contributions of the proposed architecture are as follows:

1. In this study, a DL based model, MS-DASPNet, is designed to segment MS lesions from brain MRI images. The proposed method is computationally efficient and enhances multi-scale contextual understanding using dilated convolutions, improving localization and global perception while mitigating spatial information loss.

2. The introduction of a dual-headed attention block in the skip connections of MS-DASPNet enhances feature refinement by enabling the model to focus on both spatial and channel-wise dependencies, leading to improved accuracy and boundary precision.

3. To assess the generalizability and robustness of MS-DASPNet, experiments are conducted on four publicly available datasets (ISBI-2015, Mendeley, MICCAI-2016, and MICCAI-2021), and performance is evaluated using quantitative metrics.

4. The performance of the MS-DASPNet is matched with the existing DL architectures, including UNet, Attention UNet, DenseUNet, and Res-UNet, for MS lesion extraction in brain MRI scans.

The remainder of the paper is structured as follows: The Section 2 outlines the four datasets utilized in this study. The Section 3 describes the architectural design and methodological framework of the proposed approach. The Section 4 details the experimental setup and presents the quantitative and qualitative segmentation outcomes. The Section 5 provides an in-depth evaluation of the proposed model against various state-of-the-art DL frameworks. Finally, the Section 6 summarizes the key findings and outlines potential directions for future work.

2 Dataset description

This section outlines the datasets used for the segmentation of MS lesions from brain MRI scans as presented in Table 2.

Table 2

Table 2. Summary of datasets used in the study.

2.1 ISBI-2015 dataset

In 2015, the International Symposium on Biomedical Imaging hosted the Longitudinal MS Lesion Segmentation Challenge, providing training and test data for brain MRI images acquired using a 3T Philips MRI scanner (Carass et al., 2017). The dataset includes 3D (NIfTI) brain MRI images from five patients, acquired at four time points across multiple modalities, including T2-weighted, FLAIR, MPRAGE, and proton-density-weighted scans. Although MS lesions are present in all modalities, the FLAIR modality has been chosen for MS lesion segmentation due to its clear visibility, as illustrated in Figure 1.

Figure 1

Four grayscale brain scan images in different contrast, labeled a, b, c, d. Image (a) show T2 weighted, (b) MPRAGE (c) PD (d) FLAIR.

Figure 1. ISBI-2015 sample brain MRI images in different image modalities when slice number is 100 for the first patient. (a) T2-weighted, (b) MPRAGE, (c) PD, (d) FLAIR.

2.2 Mendeley dataset

The Brain MRI Dataset of MS with Consensus Manual Lesion Segmentation and Patient Meta Information (Muslim et al., 2022) is employed for extracting MS lesions from MRI scans. This dataset includes 3D brain MRI volumes in NIfTI format from 60 patients, acquired using multiple imaging modalities, including T1-weighted, T2-weighted, and FLAIR. Each 3D volume represents a unique patient and varies in spatial dimensions. Although MS lesions are observable across all three modalities, the FLAIR modality is selected for segmentation due to its superior lesion visibility, as demonstrated in Figure 2.

Figure 2

Three MRI brain scans are shown. (a) An axial T1-weighted image highlighting brain anatomy. (b) An axial T2-weighted image displaying the brain with more fluid-sensitive contrast. (c) A FLAIR image, emphasizing white matter lesions.

Figure 2. Mendeley dataset sample brain MRI in different image modalities when slice number is 15 for the second patient in the dataset. (a) T1-weighted, (b) T2-weighted. (c) FLAIR.

2.3 MICCAI-2016/MSSEG-2016

This dataset includes three-dimensional (NIfTI) MRI scans in various modalities, including:

1. 3D FLAIR Image

2. 3D T1 Image

3. T2 Image

4. DP Image

5. 3D T1 Gd Image (Post-contrast agent image).

The training set includes 15 images, and the testing set includes 34 images, with their corresponding true labels. After slicing each 3D image, we generated a total of 2,432 images and their corresponding masks. The true lesion masks were provided by seven experts, and a consensus mask was used to evaluate the results. We used preprocessed 3D images for slicing in the FLAIR modality.

The images of the MSSEG-2016 dataset (Commowick et al., 2021a) are shown in Figure 3.

Figure 3

Five grayscale brain scan images in different contrast, labeled a, b, c, d, e (a) T1 weighted. (b) T2 weighted.(c) FLAIR.(d) T1Gd (gadolinium contrast). (e) DP-proton density

Figure 3. MICCAI-2016 dataset sample brain MRI in different image modalities when slice number is 282 for the third patient in the dataset. (a) T1-weighted. (b) T2-weighted. (c) FLAIR. (d) T1-Gd (gadolinium contrast). (e) DP-proton density.

2.4 MICCAI-2021/MSSEG-2

This dataset comprises MR neuroimaging data from 40 patients, each having undergone 3D FLAIR acquisitions at two distinct time points, with variable intervals between scans. MS lesions were segmented by four experts, and a majority vote was applied voxel-by-voxel to generate the final consensus masks, which serve as the basis for MS lesion segmentation. This study used only second-time-point images for lesion segmentation. The dataset, referred to as MSSEG-2, is publicly available at Commowick et al. (2021b). Representative images of the data set are shown in Figure 4.

Figure 4

Two MRI brain scans are shown side by side. Image (a) and image (b) display axial views with similar patterns and contrast, highlighting brain structures in grayscale tones.

Figure 4. MICCAI-2021 dataset sample brain MRI in FLAIR modalities when slice number is 241 for the thirty-fifth patient in the dataset. (a) Time point-1 image. (b) Time point-2 image.

3 Proposed method

This work proposes and evaluates MS-DASPNet, a novel deep-learning framework, on four distinct datasets: ISBI-2015, Mendeley Dataset, MICCAI-2016, and MICCAI-2021. For analysis, pixel-based classification uses DL techniques such as UNet, DenseUNet, Attention UNet, ResUNet, and MS-DASPNet. Firstly, three-dimensional brain MRI scans have been sliced to produce two-dimensional MRI images. Preprocessing includes a sequence of steps for skull removal on the dataset: (i) contrast elongation, (ii) histogram equalization, (iii) Otsu thresholding, and (iv) morphological operations. The flowchart depicting the sequence of steps involved in evaluating the proposed architecture is shown in Figure 5. The architecture of the MS-DASPNet model is presented in Figure 6 and described in Algorithm 1, and the notations used are presented in Table 3.

Figure 5

Flowchart depicting the process of brain MRI analysis. It starts with a 3-D brain MRI input, followed by slicing into 2-D images. Preprocessing involves skull removal, leading to lesion segmentation results using MS-DASPNet. Results are compared with ground truth images, concluding with performance comparison to other UNet variants.

Figure 5. Workflow of MS-DASPNet approach.

Figure 6

Flowchart of a neural network architecture initialized with VGG-16 for image segmentation. It includes an input image, ground truth, ASPP blocks, and various operations such as MaxPooling, Convolution, Concatenation, and UpSampling, leading to output segmentation. Multiple paths with arrows and labels depict the network's layers and connections.

Figure 6. Proposed model for MS lesion segmentation (encoder initialized with VGG- 16).

Algorithm 1

Algorithm 1. MS-DASPNet: dual attention and spatial pyramid pooling for MS lesion segmentation.

Table 3

Table 3. Notations and their descriptions used in the MS-DASPNet algorithm.

3.1 Preprocessing

In this step, the three-dimensional brain MRI volumes were sliced to obtain a two-dimensional representation suitable for model training. Each MRI volume and its corresponding ground-truth mask consist of approximately 300–400 slices in the FLAIR sequence. Since the proposed MS-DASPNet operates on 2D inputs, the 3D brain MRI volumes were first decomposed into axial slices, as the axial plane provides superior visualization of MS lesions. To ensure that the selected slices contained meaningful information, only slices exhibiting MS lesions with a minimum lesion size of five pixels in the corresponding ground-truth masks were retained. Furthermore, instead of using all slices from each volume, approximately 65%–70% of the axial slices were utilized for the experiments, as the initial and terminal slices predominantly correspond to peripheral brain regions and generally do not contain visible lesions. This strategy enabled the selection of slices that contained the maximum lesion information. The sliced brain MRI images obtained are affected by noise and require skull removal and extraction of brain tissues by applying the preprocessing. This is achieved by denoising each image using the NL-means algorithm (Coupé et al., 2008), and the volBrain platform (Manjón and Coupé, 2015) is used to extract the brain from the image. Finally, the bias correction is performed using the N4 algorithm (Tustison et al., 2010). The 2D images of all datasets, along with their masks, are resized to 256 × 256.

3.2 Data augmentation

Generally, deep learning models suffer from issues of overfitting due to the small size of the dataset. This issue can be resolved by data augmentation, which will improve the model's generalization ability and increase its robustness to handle complex features. In this work, the following operations are performed to expand the dataset size.

1. Rotation—Rotation has been performed at the angles of 45, 90, and 125 degrees on both the brain MRI image and the mask image.

2. Scaling—Scaling has been performed with a scale factor of 1.5 and 2 on both the images of brain MRI as well as the mask image.

3. Translation—Translation has been performed to translate an image horizontally on both images.

3.3 Overview of the MS-DASPNet architecture

This study proposes an MS-DASPNet structure—A Transfer Learning-based UNet model incorporating ASPP and Dual-Headed Attention, an effective DL design for MS lesion identification in brain MRI. The proposed model builds upon the standard UNet by integrating dual-headed attention mechanisms in skip connections and utilizing a pre-trained VGG-16 encoder for feature extraction. Additionally, ASPP is embedded in the bottleneck layer to capture multi-scale contextual information from the deepest encoder feature map. These enhancements address key limitations of conventional UNet models, including feature loss and poor boundary depiction, which are critical in medical image segmentation tasks. The architectural modifications introduced in our model aim to improve feature representation, spatial attention, and multi-scale feature learning, thereby enhancing segmentation accuracy. Figure 7 illustrates the overall network architecture, which consists of:

1. An encoder initialized with VGG-16, employing pre-trained feature representations for robust hierarchical feature extraction.

2. Dual-headed attention modules integrated within skip connections to enhance feature fusion and suppress irrelevant activations.

3. An Atrous Spatial Pyramid Pooling (ASPP) component embedded in the bottleneck to extract multi-scale contextual features from deep encoder representations.

Figure 7

Diagram illustrating an image processing architecture. It begins with an image input, followed by an encoder initialized with VGG-16. A bottleneck component features Atrous Spatial Pyramid Pooling with different convolution rates (one, six, twelve, eighteen). Skip connections with double-headed attention are incorporated. The outputs are concatenated before passing to the decoder, producing a final processed image. Arrows indicate data flow.

Figure 7. MS-DASPNet architecture.

By incorporating these improvements, the proposed model outperforms the UNet model variants, especially in detecting complex lesion boundaries.

In the preprocessing stage, each 3D MRI volume (I_3D) is converted into 2D slices, denoised using NL-means filtering, corrected for intensity inhomogeneity via N4 bias field correction, and resized to 256 × 256. Encoder features (E₁–E₄) are extracted from a pre-trained VGG-16 network. The ASPP module aggregates multi-scale context using dilation rates 1, 6, 12, 18, followed by 1 × 1 convolution for feature refinement. In the decoder, both channel and spatial attention mechanisms are applied to enhance lesion-relevant regions before skip connections are fused. The final segmentation mask (Ŝ) is produced using a 1 × 1 convolution followed by a sigmoid activation.

3.4 Network architecture

The proposed model follows a U-shaped network architecture, where the encoder is initialized with VGG-16, the ASPP module is integrated into the bottleneck, and double-headed attention is applied to each skip connection to enhance feature extraction and preserve critical details.

3.4.1 Encoder with VGG-16

The encoder in our proposed Transfer Learning-based UNet is initialized with VGG-16, a deep CNN originally designed for image classification, proposed by the Visual Geometry Group (VGG). VGG-16 is defined by its depth, comprising 16 layers, with 13 convolutional layers and three fully connected layers. Instead of training the encoder from scratch, the proposed model used pre-trained weights from the ImageNet dataset, enabling it to efficiently capture hierarchical feature representations. VGG-16 is chosen as the feature extraction network due to its strong ability to capture rich hierarchical features through its deep, pretrained architecture. It provides robust and transferable representations that generalize well across diverse medical imaging modalities, making it highly effective for image segmentation tasks. The VGG-based encoder extracts multi-scale features at different depths, which are later utilized in the UNet decoder through skip connections.

The encoder follows the architecture of VGG-16, which consists of five convolutional blocks, each containing multiple convolutional layers followed by ReLU activation and max pooling for downsampling. Let I∈ℝ^H×W×C be the input, where H is the height, W is the width, and C is the number of channels of the input MRI image. The feature extraction process through a convolutional layer is described in Equation 1:

\begin{array}{l} F_{l} = σ (W_{l} * F_{l - 1} + b_{l}) & (1) \end{array}

where:

• F_l represents the feature map at layer l,

• W_l and b_l are the pretrained weights and biases,

• * denotes the convolution operation,

• σ is the ReLU activation function, defined as σ(x) = max(0, x).

The max pooling operation reduces the spatial dimensions by a factor of 2 at each block, as shown in Equation 2:

\begin{array}{l} F_{l}^{'} = max (F_{l} (i, j)) w h e r e : F_{l}^{'} i s t h e p o o l e d f e a t u r e m a p & (2) \end{array}

Since VGG-16 was pre-trained on ImageNet, we freeze its layers during the initial training to retain the learnt feature representations while preventing unnecessary weight updates. Using VGG-16 as the encoder effectively captures both low-level and high-level features, as the model benefits from being pre-trained on the large and diverse ImageNet dataset. The complete feature extraction process through VGG-16 on a sample brain MRI image has been demonstrated in Figure 8.

Figure 8

Operations performed during feature extraction using VGG-16.

Figure 8. Feature extraction using VGG-16.

3.4.2 Atrous Spatial Pyramid Pooling (ASPP)

ASPP is an advanced feature extraction method that captures multi-scale contextual information by employing multiple dilated convolutions with varying dilation rates. Integrating ASPP into the bottleneck of the architecture enhances the model's capacity to aggregate features across multiple receptive field scales while maintaining computational efficiency. The bottleneck layer represents the deepest feature-extraction stage, where spatial resolution is substantially reduced, resulting in the loss of fine-grained details and contextual information. Incorporating ASPP into the bottleneck addresses these challenges by (i) expanding the receptive field without increasing the parameter count, (ii) capturing both local and global contextual features, and (iii) retaining fine details essential for accurate lesion extraction.

ASPP applies parallel atrous convolutions with different dilation rates to the same feature map, enabling multi-scale feature extraction. Let F_ASPP denote the output feature map after applying ASPP, F_bottleneck be the input feature map from the VGG-16 encoder, and W_i, b_i be the weights and biases of the atrous convolution filters at different scales. The convolution with an atrous rate r_i is denoted as *_{r_i}. The number of atrous convolutions with different dilation rates is represented by n. The feature map computed by ASPP is represented in Equation 3:

\begin{array}{l} F_{ASPP} = \sum_{i = 1}^{n} (W_{i} *_{r_{i}} F_{bottleneck} + b_{i}) & (3) \end{array}

ASPP includes a 1 × 1 convolution (r = 1) for local feature extraction. Three atrous convolutions with dilation rates of r = 6, 12, and 18 for multi-scale context. Global average pooling to incorporate global image context. Concatenation of all resulting feature maps followed by a 1 × 1 convolution to fuse the information. The final output of the ASPP module is computed as in Equation 4:

\begin{array}{l} F_{out} = σ (W_{c} \cdot [F_{1}, F_{6}, F_{12}, F_{18}, F_{GAP}] + b_{c}) & (4) \end{array}

Where F₁, F₆, F₁₂, and F₁₈ denote the outputs of convolutions with dilation rates of 1, 6, 12, and 18, respectively, F_GAP represents the global average pooled feature map, [·] indicates channel-wise concatenation, W_c and b_c are the learnable weights and biases for feature fusion, and σ denotes the ReLU activation function.

ASPP enhances multi-scale feature learning across different spatial resolutions, improves boundary distinction, and maintains computational efficiency because it does not significantly increase the number of model parameters.

3.4.3 Double-headed attention (DHA)

In the proposed architecture, the DHA block refines skip connections, effectively integrating channel and spatial attention for enhanced feature representation. Spatial attention highlights important regions in the feature map, whereas channel attention prioritizes the most relevant feature channels. This dual attention mechanism improves feature selection and preservation, improving lesion identification accuracy. The architecture of DHA is described in Figure 9.

Figure 9

Diagram of a Double Headed Attention Block. Input features undergo average and max pooling, are concatenated, and processed through Conv-2D. Spatial attention is applied. Then, features undergo another round of pooling, concatenation, and pass through a fully connected layer, followed by channel attention.

Figure 9. Double-headed attention block.

The spatial attention mechanism highlights regions of interest by applying Global Average Pooling (GAP) and Global Max Pooling (GMP) across spatial dimensions. The resulting attention map is then used to refine the input feature map, emphasizing spatially significant areas.

Given an input feature map X∈ℝ^H×W×C, spatial attention is computed as:

\begin{array}{l} F_{avg} = \frac{1}{C} \sum_{c = 1}^{C} X_{h, w, c}, F_{max} = max (X_{h, w, c}) & (5) \end{array}

where F_avg and $F_{max} \in ℝ^{H \times W \times 1}$ are the global average pooled and global max pooled feature maps, respectively. These maps are concatenated and passed through a 7 × 7 convolution followed by a sigmoid activation to obtain the spatial attention mask A_s, as defined in Equation 6:

\begin{array}{l} A_{s} = σ (f_{conv 7 \times 7} ([F_{avg}, F_{max}])) & (6) \end{array}

where σ denotes the sigmoid activation function. The final attention-weighted feature map X_s is obtained by:

\begin{array}{l} X_{s} = A_{s} ⊙ X & (7) \end{array}

where ⊙ represents element-wise multiplication. This mechanism enhances important spatial regions in the feature map, improving the model's focus on relevant areas during extraction.

The channel attention mechanism provides importance to each feature channel using Global Average Pooling (GAP) and Global Max Pooling (GMP) across spatial dimensions.

Given an input $X_{s} \in ℝ^{H \times W \times C}$ (from the spatial attention module), channel descriptors are computed as:

\begin{array}{l} F_{avg}^{c} = \frac{1}{H \times W} \sum_{h = 1}^{H} \sum_{w = 1}^{W} X_{h, w, c} & (8) \end{array}

\begin{array}{l} F_{max}^{c} = max_{h, w} X_{h, w, c} & (9) \end{array}

where $F_{avg}^{c}$ and $F_{max}^{c}$ represent the globally averaged and maximized pooled channel features, respectively.

These descriptors are concatenated and passed through a two-layer dense network as follows:

\begin{array}{l} A_{c} = σ (W_{2} \cdot ReLU (W_{1} [F_{avg}^{c}, F_{max}^{c}]) + b_{2}) & (10) \end{array}

where W₁ and W₂ are learnable weight matrices, b₂ is the bias term, and σ denotes the sigmoid activation function. The final channel-wise attention-weighted feature map is obtained as:

\begin{array}{l} X_{c} = A_{c} ⊙ X_{s} & (11) \end{array}

where ⊙ denotes element-wise multiplication, applying the attention weights to refine the features along the channel dimension.

After Dual-Headed Attention (DHA) is applied to each skip connection, the enhanced feature maps are concatenated with the upsampled decoder features. This process is represented in Equation 12, where $F_{skip}^{i}$ denotes the DHA-enhanced skip connection feature map, and $F_{dec}^{i}$ is the upsampled decoder feature map at level i:

\begin{array}{l} F_{merged}^{i} = Concatenate (F_{dec}^{i}, F_{skip}^{i}) & (12) \end{array}

This fusion of encoder and decoder features enables richer contextual representation and more precise localization, which is crucial for accurate detection.

3.5 Loss function

Binary Cross-Entropy (BCE) is a loss function and serves as the primary objective for training, quantifying the divergence between the predicted probability map and the actual ground-truth labels. Given a ground truth mask y_i and its corresponding predicted output o_i, the BCE loss is mathematically expressed as follows:

\begin{array}{l} BCE = - \sum_{i} [y_{i} log o_{i} + (1 - y_{i}) log (1 - o_{i})] & (13) \end{array}

BCE loss is particularly advantageous for tasks that require classifying each pixel as belonging to either the object of interest (foreground) or the background. Since binary segmentation is inherently a pixel-wise classification problem, BCE is well-suited for optimizing the model's predictive capability. It treats each pixel independently, ensuring the network learns to effectively differentiate between the two classes. By reducing BCE loss during training, the model enhances its accuracy, making it a widely used loss function for binary applications.

3.6 Evaluation metrics

To evaluate performance, different performance parameters have been utilized, as mentioned in Equations 14–17.

• True positive (T_P): MS pixel identified as MS

• True negative (T_N): non-MS pixel identified as non-MS

• False negative (F_N): MS pixel identified as non-MS

• False positive (F_P): non-MS pixel identified as MS

\begin{array}{l} Precision/correctness = \frac{T_{P}}{T_{P} + F_{P}} & (14) \end{array}

\begin{array}{l} Sensitivity/completeness = \frac{T_{P}}{T_{P} + F_{N}} & (15) \end{array}

\begin{array}{l} Quality/Jaccard = \frac{T_{P}}{T_{P} + F_{N} + F_{P}} & (16) \end{array}

\begin{array}{l} Dice score = \frac{2 \cdot T_{P}}{2 \cdot T_{P} + F_{P} + F_{N}} & (17) \end{array}

4 Experimental results and discussion

This section outlines the experimental settings employed to simulate MS-DASPNet architectures, followed by a comprehensive analysis and discussion of the resulting experimental outcomes.

4.1 Experimental settings

The experiments are conducted on a GPU-equipped system featuring an Intel Core i5 processor at 2.5 GHz clock speed, 16 GB RAM, and an NVIDIA RTX graphics card. The details about the hyperparameters used for the training of the proposed model are presented in Table 4. All datasets are divided into training and testing subsets in a 90:10 ratio to ensure that no patient data overlaps between these sets to avoid data leakage. To prevent overfitting, a dropout rate of 0.2 is incorporated as a regularization technique, while a learning rate of 0.001 is used to ensure stable and efficient convergence. The model used convolutional filters of progressively increasing sizes (16, 32, 64, 128, and 256) to capture multi-scale features, along with a kernel size of 7 for enhanced feature extraction. A batch size of 4 was used, and the BCE loss was effectively selected for the lesion extraction task. The Adam optimizer, with an adaptive learning rate, is used over 100 epochs to fine-tune the model parameters, resulting in robust training and precise outcomes. The details regarding the number of subjects used for conducting experiments on each dataset are provided in Table 5.

Table 4

Table 4. Hyperparameters used by MS-DASPNet.

Table 5

Table 5. Summary of the datasets used in this study.

4.2 Experimental results and analysis

The proposed experiments are conducted on four datasets, namely ISBI-2015, Mendeley, MICCAI-2016, and MICCAI-2021.

4.2.1 On ISBI-2015 dataset

The segmentation results produced by MS-DASPNet on the ISBI-2015 dataset are illustrated in Figure 10. Specifically, Figures 10a1–a4 displays four sample brain MRI images from the dataset, while Figures 10b1–b4 presents the corresponding ground truth masks used for performance evaluation. Figures 10c1–c4 shows the segmentation outputs generated by the MS-DASPNet architecture. Based on visual inspection of the MS lesions extracted by the proposed method, MS-DASPNet effectively identifies them. The performance is further evaluated using various quantitative parameters, and results are shown in Table 6. The experimental results reveal that the MS-DASPNet model demonstrates robust performance in MS lesion segmentation on the ISBI-2015 dataset, obtaining a Dice Score of 0.8329 and a Jaccard index of 0.733, which reflect strong spatial overlap between the predicted and reference lesion. The precision of 0.9252 indicates high confidence in positive lesion predictions, with minimal over-segmentation. A sensitivity of 0.7804 suggests effective lesion detection, capturing a majority of true lesion voxels, although with some under-segmentation in more diffuse or low-contrast regions. The specificity of 0.9987 and a false positive rate (FPR) of just 0.0013 confirm that the model maintains strong background suppression, crucial for minimizing false detections. These results highlight MS-DASPNet's ability to balance lesion sensitivity and anatomical specificity, making it well-suited for automated quantification of MS lesion load in clinical MRI scans.

Figure 10

Four rows of brain images are displayed, each row containing three columns labeled a, b, and c. The first column shows MRI scans (a1 to a4), while the second (b1 to b4) and third columns (c1 to c4) display corresponding mask image and segmented images using MS-DASPNet with highlighted yellow areas against a purple background.

Figure 10. Results obtained on ISBI-2015 dataset. (a1–a4) Brain MRI image. (b1–b4) Corresponding ground truth image. (c1–c4) Segmentation obtained using MS-DASPNet.

Table 6

Table 6. Experimental results of MS-DASPNet on different quantitative parameters.

4.2.2 On MICCAI-2016 dataset

The MS lesions segmented by the MS-DASPNet on the images of the MICCAI-2016 dataset are presented in Figures 11a1–a4 and are visualized in Figures 11c1–c4. The reference MS lesions are shown in Figures 11b1–b4. The experimental results on test data of the MICCAI-2016 dataset for different quantitative parameters are presented in Table 6. On the MICCAI 2016 dataset, MS-DASPNet achieves a Dice score of 0.8736 and a Jaccard index of 0.7982, indicating that the method used has segmented the MS lesion efficiently. The precision of 0.8746 and sensitivity of 0.8834 indicate that the model is performing accurately in detecting MS lesions while minimizing the FPR. The high sensitivity (0.8834) indicates that MS-DASPNet successfully identifies even subtle lesion regions in this particular case, which is critical in medical image analysis, where overlooking affected regions can lead to serious diagnostic implications.

Figure 11

Four confusion matrices, labeled (a)–(d), illustrate the performance of a binary classification model. Each matrix compares the predicted and true labels for background (bg) and lesion classes. The dark blue cells represent true positives. The variations in the values across the matrices indicate differences in model accuracy under each scenario.

Figure 11. Results obtained on the MICCAI-2016 dataset. (a1–a4) Brain MRI image. (b1–b4) Corresponding ground truth image. (c1–c4) Segmentation obtained using MS-DASPNet.

4.2.3 On MICCAI-2021 dataset

The effectiveness of the proposed MS-DASPNet architecture on the MICCAI-2021 dataset is illustrated in Figure 12 and summarized in Table 6. The segmented MS lesion shown in Figures 12c1–c4 corresponds to images shown in Figure 12, which reveals that MS-DASPNet segments MS lesions from MRI scans efficiently. On the MICCAI 2021 dataset, MS-DASPNet maintains consistently high segmentation accuracy, showcasing its robustness in detecting MS lesions under varying imaging conditions. The model achieves a Dice coefficient of 0.8706 and a Jaccard index of 0.7719, indicating strong agreement with expert-labeled lesion masks. The high specificity (0.9996) and low FPR (0.0004) values highlight its excellent background discrimination capability, which is critical for accurate MS lesion segmentation.

Figure 12

Figure 12. Segmentation results obtained MICCAI-2021 dataset. (a1–a4) Brain MRI image. (b1–b4) Corresponding ground truth image. (c1–c4) Segmentation obtained using MS-DASPNet.

4.2.4 On Mendeley dataset

The experimental results of the proposed MS-DASPNet model are presented in Figure 13 and evaluated on different evaluation metrics as presented in Table 6. In the Mendeley dataset, MS-DASPNet has performed comparatively poorly relative to other datasets, which indicates challenges in accurately identifying MS lesions. The model achieves a Dice coefficient of 0.5076 and a Jaccard index of 0.3519, which reflect its inability to handle the artifacts present in MRI Scans. Despite these limitations, the model maintains a high specificity (0.9985) and a low false-positive rate (0.0015), implying that it effectively distinguishes non-lesion regions and avoids false detections.

Figure 13

Figure 13. Segmentation Results obtained on Mendeley dataset. (a1–a4) Brain MRI image. (b1–b4) Corresponding ground truth image. (c1–c4) Segmentation obtained using MS-DASPNet.

MS-DASPNet was evaluated on four publicly available FLAIR MRI datasets, which are highly efficient at emphasizing hyperintense white matter lesions that are characteristic of MS. The model demonstrated superior performance on MICCAI 2016 and MICCAI 2021, with Dice scores of 0.8736 and 0.8706, respectively, indicating strong spatial overlap with clinically annotated lesion masks. These results prove that the model is efficient in precisely demarcating periventricular and juxtacortical MS lesions, even in the presence of complex morphology. Sensitivities above 0.85 and FPRs below 0.0005 on these datasets indicate that MS-DASPNet can identify fine lesion patterns without generating false alarms in normal-appearing white matter.

On the ISBI 2015 dataset, the model achieved high precision (0.9252) and balanced Dice and recall scores, reflecting a consistent trade-off between detection and false-positive control. It was comparatively lower on the Mendeley dataset, with a Dice score of 0.5076 and a sensitivity of 0.4357, likely due to differences in lesion presentation or poor image quality. Notwithstanding this, the model consistently achieved a specificity of >0.998 in all datasets, affirming its reliability in suppressing non-lesion brain areas. In general, MS-DASPNet holds great promise for clinical utility in automated quantification of MS lesion load, assessment of disease progression, and tracking of treatment response.

5 Comparative analysis

In this section, the performance of the proposed MS-DASPNet is compared with well-known DL based architectures designed for the segmentation of MS lesions as presented in Tables 7–10. The Dice score is selected as a standard parameter for assessing the performance of MS lesion segmentation, as it addresses the issue of class imbalance. Generally, MS lesion regions are small compared to the background, and the accuracy metric can yield false results. However, the Dice score emphasizes the correct classification of lesion regions, avoids issues of background pixels, and is more sensitive to FPR.

Table 7

Table 7. Comparative evaluation of the proposed method with state-of-the-art methods on ISBI-2015 dataset using dice coefficient.

5.1 On ISBI 2015 dataset

On the ISBI 2015 dataset, the performance of the proposed technique is matched with other DL architectures based on UNet and its variants, like UNet Ronneberger et al., (2015), DenseUNet Cao et al., (2020), Res-UNet Zhang et al., (2018), Attention-Unet Oktay et al., (2018), along with other DL architectures such as transformer-based Azad et al., (2024), patch-based, multi-view CNN Birenbaum and Greenspan, (2016), CNN with three Inception modules Ansari et al., (2021), and cascaded 3D CNN (Valverde et al., 2017a). A detailed comparison of the performance of the proposed method on the ISBI-2015 dataset is presented in Table 7. It has been observed that MS-DASPNet achieves a Dice score of 0.8329, outperforming well-established models such as UNet, Res-UNet, and Attention UNet. It also significantly surpasses CNN-based techniques like those proposed by (Birenbaum and Greenspan 2016), (Ansari et al. 2021), and (Valverde et al. 2017a), whose performance ranges from 0.62 to 0.63, partly due to their reliance on limited contextual understanding. Although on the ISBI-2015 dataset DenseUNet has obtained the highest Dice score of 0.8808, which is 5% higher than the proposed method, this is due to its increased computational complexity arising from the dense architecture. Among all the techniques presented in Table 7, the MS-DASPNet architecture has obtained the second-best and promising results in the segmentation of small, scattered lesions even in areas affected by low tissue contrast, intensity inhomogeneity, or partial volume effects—common challenges in brain MRI segmentation.

5.2 On MICCAI-2016 dataset

On the MICCAI-2016 Dataset, the performance of MS-DASPNet is matched on the Dice score and Sensitivity metric with various DL architectures UNet (Ronneberger et al., 2015), DenseUNet (Cao et al., 2020), Res-UNet Zhang et al., (2018), Attention-Unet Oktay et al., (2018), CNN-based architectures proposed by (Beaumont and Greenspan 2016), (Vera-Olmos et al. 2016), (Mahbod et al. 2016), and patch-wise DNN by (Kaur et al. 2024). Furthermore, the performance is compared with the traditional method based on Expectation-Maximization and graph-cut-based segmentation by (Beaumont et al. 2016b), rule-based techniques proposed by (Beaumont et al. 2016a), and intensified edge-based by (Knight and Khademi 2016). The detailed comparison is presented in the Table 8, and it has been observed that MS-DASPNet has the highest Dice Score of 0.8736, surpassing most benchmark DL models on Dice score, like UNet of 0.8635, DenseUNet of 0.8657, Attention-UNet of 0.8663, and Res-UNet of 0.8692. White's CNN architecture by (Kaur et al. 2024) is a close competitor to MS-DASPNet, achieving a Dice score of 0.8700. However, on the sensitivity metric, Res-UNet has achieved a slightly better result, 0.0009, than MS-DASPNet. MS-DASPNet exhibits a superior Dice Score, indicating a more balanced segmentation output with enhanced overlap accuracy. Whereas the traditional methods by (Beaumont et al. 2016b,a), (Knight and Khademi 2016) and (Vera-Olmos et al. 2016) have very low dice Scores, ranging from 0.5300 to 0.6000, reflecting their limited capability to generalize across lesion variations in FLAIR MRI due to reliance on handcrafted features, simple voxel-wise classifiers, or rule-based refinements.

Table 8

Table 8. Comparative evaluation of the proposed method with state-of art methods on MICCAI-2016 dataset on dice score and sensitivity metric.

5.3 On MICCAI-2021 dataset

The comparative study of the proposed MS-DASPNet on the MICCAI-2021 Dataset is presented in Table 9 against several state-of-the-art segmentation approaches on dice-score. The MS-DASPNet method obtained the highest Dice Score of 0.8706, outperforming all counterparts. Although Res-UNet has a Dice Score of 0.8628, it showcases strong segmentation capability due to its residual connections. Moreover, DenseUNet, UNet, and Attention-UNet reported moderate Dice Scores of 0.7840, 0.6559, and 0.3914, respectively. Traditional and early methods, such as Basaran et al. (2022) and McKinley et al. (2021), had Dice Scores of 0.5100 and 0.6380, respectively, which reflect their limitations in handling the complex and heterogeneous nature of MS lesions in FLAIR MRI, which often exhibit high inter-patient variability and poor boundary definition. The proposed MS-DASPNet's strong performance on the MICCAI-2021 Dataset demonstrates its ability to effectively segment small, irregularly shaped lesions and maintain boundary integrity. Its integration of multi-scale feature aggregation via ASPP and attention-based contextual filtering enables it to achieve superior overlap with expert annotations.

Table 9

Table 9. Comparative evaluation of the proposed MS-DASPNet method with state-of-the-art methods on the MICCAI-2021 dataset on dice score.

5.4 Mendeley dataset

The comparative evaluation of the proposed MS-DASPNet is presented in Table 10 against several well-established DL models, UNet, DenseUNet, Attention-UNet, and Res-UNet. The best-performing method on this dataset is Attention-UNet, with a Dice Score of 0.6012, followed by DenseUNet and UNet. These results suggest that attention mechanisms and dense feature propagation provide tangible benefits in improving lesion detection and boundary delineation when applied to the Mendeley dataset. On the other hand, the proposed MS-DASPNet has a Dice score of 0.5076, which is lower than most of the evaluated DL models. While this result may appear suboptimal compared to the model's performance on other datasets (e.g., MICCAI-2016 and MICCAI-2021), it reflects the unique challenges presented by the Mendeley dataset. The lower Dice score on the Mendeley dataset can be attributed to greater variability in lesion appearance and increased lesion complexity. In particular, the dataset contains smaller and more fragmented lesion regions, which make accurate segmentation more challenging and increase sensitivity to minor boundary errors, thereby directly affecting the Dice metric.

Table 10

Table 10. Comparative evaluation of the proposed MS-DASPNet with DL methods on the Mendeley dataset on dice coefficient.

The MS-DASPNet model integrates a VGG-16-based encoder, ASPP in the bottleneck for multi-level context learning, and dual-headed attention modules in the skip connections. This architectural synergy enables the model to effectively capture both coarse global structures and fine local details, leading to significantly improved segmentation performance. MS-DASPNet demonstrates superior performance on standardized datasets such as MICCAI-2016, MICCAI-2021, and ISBI-2015. Its moderate performance on the Mendeley dataset (Dice: 0.5076) highlights the importance of dataset diversity and domain adaptation in medical image segmentation tasks. The attention maps for selected representative images are illustrated in Figure 14 using the Grad-CAM technique. Specifically, Figures 14a1–a4 depict the original brain MRI slices, while Figures 14b1–b4 present the corresponding Grad-CAM heatmaps highlighting the regions that most strongly influence the model's predictions. Furthermore, subfigures Figures 14c1–c4 show the Grad-CAM overlays superimposed on the original MRI images, providing a clear visual interpretation of the model's focus. These visualizations demonstrate that the network effectively attends to clinically relevant lesion regions, thereby enhancing the interpretability and reliability of the proposed model. Overall, MS-DASPNet provides a well-balanced solution that not only enhances accuracy across various datasets but also maintains computational efficiency, making it a strong candidate for real-world clinical deployment in medical image segmentation tasks. The overall performance of the proposed method on different test datasets is presented as a confusion matrix in Figure 15.

Figure 14

Figure 14. Attention map using MS-DASPNet. (a1–a4) Brain MRI image. (b1–b4) Attention map using GRAD-CAM. (c1–c4) GRAD-CAM overlay.

Figure 15

Four confusion matrices labeled (a) through (d) show the performance of a binary classification model. Each matrix compares predicted and true labels of background (bg) and lesion. Dark blue cells represent true positives and true negatives. The values in matrices vary across images, indicating different model accuracy for each scenario.

Figure 15. Confusion matrix of experimental results. (a) ISBI-2015 Dataset. (b) MICCAI-2016 Dataset, (c) On MICCAI-2021 dataset, and (d) Mendeley dataset.

6 Ablation study

An ablation study was performed to assess the robustness and generalization capability of the proposed MS-DASPNet under different training–testing configurations. The model was evaluated using three test split ratios, namely 10%, 15%, and 20%, and the corresponding quantitative results are reported in Tables 6, 11. Across all datasets, MS-DASPNet demonstrates consistent segmentation performance with only marginal variations in Dice and Jaccard scores as the test split changes. This behavior highlights the stability of the proposed architectural components and indicates that the model's performance is not overly sensitive to a particular data partition.

Table 11

Table 11. Experimental results of MS-DASPNet on different quantitative parameters for 10%, 15%, and 20% test data.

For benchmark datasets such as ISBI 2015 and MICCAI 2016, high Dice scores are maintained across all three test splits, accompanied by consistently high specificity and extremely low false positive rates, underscoring the model's ability to accurately suppress false lesion detections. Although slight performance fluctuations are observed when increasing the test split from 10% to 20%, these variations can be attributed to differences in lesion distribution and subject composition within the test sets, which is common in medical image segmentation tasks involving limited and imbalanced data. Overall, the ablation results confirm that integrating the proposed modules into MS-DASPNet yields robust and reliable lesion segmentation across varying data splits, thereby strengthening the validity of the reported performance claims.

7 Conclusion

This study presents MS-DASPNet, a UNet-based architecture designed for accurate lesion segmentation from brain MRI scans. MS-DASPNet employs a VGG-16 encoder, enabling robust and transferable low-level feature extraction. Including an ASPP bottleneck captures multiscale contextual information crucial for lesions of diverse sizes and shapes. Most notably, the dual-headed attention mechanism in skip connections enables the model to concentrate selectively on relevant spatial and channel-wise features, enhancing segmentation precision without incurring high computational costs. The approach begins with converting 3D brain MRI volumes into 2D slices, followed by skull removal using preprocessing techniques. Various image augmentation techniques were employed to mitigate overfitting caused by the limited dataset size and to enhance model generalization. Experiments were conducted on four publicly available datasets, and segmentation performance was evaluated using four metrics: Dice Score, precision, sensitivity, and Jaccard Index. The proposed MS-DASPNet model achieved the highest Dice Score of 0.8736, outperforming several existing UNet variants. The proposed MS-DASPNet achieved a Dice Score of 0.8736, outperforming several existing U-Net variants. Specifically, compared to the baseline U-Net model (Dice = 0.8635), MS-DASPNet achieved a 1.17% improvement in Dice score, while surpassing Attention U-Net (Dice = 0.8663) by 0.84% and ResU-Net (Dice = 0.8692) by 0.51%. On the MICCAI-2021 dataset, MS-DASPNet achieved a Dice Score of 0.8706, reflecting a 0.82% enhancement over the baseline and consistent improvement across all evaluation metrics.

Comparative analysis with state-of-the-art methods, including both traditional and DL-based techniques, revealed consistent improvements MS-DASPNet offers across different datasets. Specifically, on the MICCAI-2016 dataset, MS-DASPNet outperformed other architectures, including UNet variants, with the highest dice score of 0.8736. Further evaluations on the MICCAI-2021 dataset show that MS-DASPNet outperformed other architectures, including UNet variants, achieving the highest Dice score of 0.8706. In summary, the MS-DASPNet architecture demonstrates robust generalization, flexibility, and accuracy across multiple datasets, positioning it as a highly promising approach for medical image applications. In future work, k-fold cross-validation will be employed to assess further and enhance the model's robustness and generalization across diverse data distributions.

Data availability statement

Publicly available datasets were analyzed in this study. This study uses four types of datasets. (a) The ISBI-2015 dataset is available at: https://smart-stats-tools.org/lesion-challenge-2015 (b) The Mendeley dataset is available at: https://data.mendeley.com/datasets/8bctsm8jz7/1 (c) The MICCAI-2016 dataset is available by request from https://shanoir.irisa.fr/shanoir-ng/study/details/209 (d) The MICCAI-2021 dataset is available by request from https://shanoir.irisa.fr/shanoir-ng/study/details/208.

Author contributions

SJ: Investigation, Writing – original draft, Methodology, Writing – review & editing, Conceptualization, Formal analysis. NR: Writing – original draft, Methodology, Conceptualization, Writing – review & editing, Supervision. PS: Writing – review & editing, Supervision, Writing – original draft, Methodology, Formal analysis, Conceptualization.

Funding

The author(s) declared that financial support was not received for this work and/or its publication.

Acknowledgments

All authors thank Guru Gobind Singh Indraprastha University, Dwarka, Delhi, and Manipal University Jaipur, Rajasthan, India.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Ackaouy, A., Courty, N., Vallée, E., Commowick, O., Barillot, C., Galassi, F., et al. (2020). Unsupervised domain adaptation with optimal transport in multi-site segmentation of multiple sclerosis lesions from MRI data. Front. Comput. Neurosci. 14:19. doi: 10.3389/fncom.2020.00019

PubMed Abstract | Crossref Full Text | Google Scholar

Akbarpour, T., Shamsi, M., Daneshvar, S., and Pooreisa, M. (2017). Unsupervised multimodal magnetic resonance images segmentation and multiple sclerosis lesions extraction based on edge and texture features. Appl. Med. Inform. 39, 30–40.

Google Scholar

Anbeek, P., Vincken, K. L., Van Bochove, G. S., and Van Osch, M. J. (2005). Probabilistic segmentation of white matter lesions in MR imaging. NeuroImage 27, 610–620. doi: 10.1016/j.neuroimage.2005.05.046

PubMed Abstract | Crossref Full Text | Google Scholar

Ansari, S. U., Javed, K., Qaisar, S. M., Jillani, R., and Haider, U. (2021). Multiple sclerosis lesion segmentation in brain MRI using inception modules embedded in a convolutional neural network. J. Healthc. Eng. 2021:4138137. doi: 10.1155/2021/4138137

PubMed Abstract | Crossref Full Text | Google Scholar

Aslani, S., Dayan, M., Storelli, L., Filippi, M., Murino, V., Rocca, M. A., et al. (2019). Multi-branch convolutional neural network for multiple sclerosis lesion segmentation. NeuroImage 196, 1–15. doi: 10.1016/j.neuroimage.2019.03.068

PubMed Abstract | Crossref Full Text | Google Scholar

Atlason, H. E., Love, A., Sigurdsson, S., Gudnason, V., and Ellingsen, L. M. (2019). Segae: Unsupervised white matter lesion segmentation from brain MRIs using a CNN autoencoder. NeuroImage: Clin. 24:102085. doi: 10.1016/j.nicl.2019.102085

PubMed Abstract | Crossref Full Text | Google Scholar

Azad, R., Kazerouni, A., Heidari, M., Aghdam, E. K., Molaei, A., Jia, Y., et al. (2024). Advances in medical image analysis with vision transformers: a comprehensive review. Med. Image Anal. 91:103000. doi: 10.1016/j.media.2023.103000

PubMed Abstract | Crossref Full Text | Google Scholar

Basaran, B. D., Matthews, P. M., and Bai, W. (2022). New lesion segmentation for multiple sclerosis brain images with imaging and lesion-aware augmentation. Front. Neurosci. 16:1007453. doi: 10.3389/fnins.2022.1007453

PubMed Abstract | Crossref Full Text | Google Scholar

Beaumont, J., Commowick, O., and Barillot, C. (2016a). “Automatic multiple sclerosis lesion segmentation from intensity-normalized multi-channel MRI,” in Proceedings of the 1st MICCAI Challenge on Multiple Sclerosis Lesions Segmentation Challenge Using a Data Management and Processing Infrastructure-MICCAI-MSSEG (Cham: Springer).

Google Scholar

Beaumont, J., Commowick, O., and Barillot, C. (2016b). “Multiple sclerosis lesion segmentation using an automated multimodal graph cut,” in Proceedings of the 1st MICCAI Challenge on Multiple Sclerosis Lesions Segmentation Challenge Using a Data Management and Processing Infrastructure-MICCAI-MSSEG (Athens: MICCAI Society), 1–8.

Google Scholar

Beaumont, J., and Greenspan, H. (2016). “Automatic segmentation of multiple sclerosis lesions using deep learning,” in Proc. MICCAI 2016 Workshop on MS Lesion Segmentation, 123–130.

Google Scholar

Birenbaum, M., and Greenspan, H. (2016). Longitudinal multiple sclerosis lesion segmentation using multi-view convolutional neural networks. Artif. Intell. Med. 87, 67–75. doi: 10.1007/978-3-319-46976-8_7

Crossref Full Text | Google Scholar

Cao, Y., Liu, S., Peng, Y., and Li, J. (2020). Denseunet: densely connected UNet for electron microscopy image segmentation. IET Image Process. 14, 2682–2689. doi: 10.1049/iet-ipr.2019.1527

Crossref Full Text | Google Scholar

Carass, A., Roy, S., Jog, A., Cuzzocreo, J. L., Magrath, E., Gherman, A., et al. (2017). Longitudinal multiple sclerosis lesion segmentation: resource and challenge. NeuroImage 148, 77–102. doi: 10.1016/j.neuroimage.2016.12.064

PubMed Abstract | Crossref Full Text | Google Scholar

Cetin, O., Seymen, V., and Sakoglu, U. (2020). Multiple sclerosis lesion detection in multimodal MRI using simple clustering-based segmentation and classification. Inform. Med. Unlocked 20:100409. doi: 10.1016/j.imu.2020.100409

Crossref Full Text | Google Scholar

Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. (2017). Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848. doi: 10.1109/TPAMI.2017.2699184

PubMed Abstract | Crossref Full Text | Google Scholar

Commowick, O., Hadj-Selem, F., Sailer, M., Kuestner, T., Marr, B., Richter, C., et al. (2021b). “MSSEG-2 challenge proceedings: multiple sclerosis new lesions segmentation challenge using a data management and processing infrastructure,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2021, Lecture Notes in Computer Science (Vol. 12981), eds. M. de Bruijne et al. (Cham: Springer). doi: 10.1007/978-3-030-97281-3_24

Crossref Full Text | Google Scholar

Commowick, O., Istace, A., Kain, M., Laurent, B., Leray, F., Simon, M., et al. (2021a). Multiple sclerosis lesions segmentation from multiple experts: the MICCAI 2016 challenge dataset. NeuroImage Clin. 32:102830. doi: 10.1016/j.neuroimage.2021.118589

PubMed Abstract | Crossref Full Text | Google Scholar

Compston, A., and Coles, A. (2008). Multiple sclerosis. Lancet 372, 1502–1517. doi: 10.1016/S0140-6736(08)61620-7

Crossref Full Text | Google Scholar

Coupé, P., Yger, P., Prima, S., Hellier, P., Kervrann, C., Barillot, C., et al. (2008). An optimized blockwise nonlocal means denoising filter for 3-d magnetic resonance images. IEEE Trans. Med. Imaging 27, 425–441. doi: 10.1109/TMI.2007.906087

PubMed Abstract | Crossref Full Text | Google Scholar

Davarani, M. N., Darestani, A. A., Cañas, V. G., Harirchian, M. H., Zarei, A., Havadaragh, S. H., et al. (2025). Enhanced segmentation of active and nonactive multiple sclerosis plaques in t1 and flair MRI images using transformer-based encoders. Int. J. Imaging Syst. Technol. 35:e70120. doi: 10.1002/ima.70120

Crossref Full Text | Google Scholar

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., et al. (2021). “An image is worth 16x16 words: transformers for image recognition at scale,” in International Conference on Learning Representations (ICLR). OpenReview.net.

Google Scholar

Filippi, M., Rocca, M. A., Ciccarelli, O., De Stefano, N., Evangelou, N., Kappos, L., et al. (2016). MRI criteria for the diagnosis of multiple sclerosis: magnims consensus guidelines. Lancet Neurol. 15, 292–303. doi: 10.1016/s1474-4422(15)00393-2

PubMed Abstract | Crossref Full Text | Google Scholar

He, K., Gan, C., Li, Z., Rekik, I., Yin, Z., Ji, W., et al. (2023). Transformers in medical image analysis. Intell. Med. 3, 59–78. doi: 10.1016/j.imed.2022.07.002

Crossref Full Text | Google Scholar

Isensee, F., Jaeger, P. F., Kohl, S. A. A., Zimmerer, D., Jaeger, P. F., Kohl, S., et al. (2018). NNU-net: a self-adapting framework for u-net-based medical image segmentation. arXiv [preprint]. arXiv:1809.10486. doi: 10.48550/arXiv.1809.10486

Crossref Full Text | Google Scholar

Jain, S., Rajpal, N., and Yadav, J. (2020). Multiple Sclerosis Identification Based on Ensemble Machine Learning Technique. Cham: Springer. doi: 10.2139/ssrn.3734806

Crossref Full Text | Google Scholar

Jha, D., Riegler, M. A., Johansen, D., Johansen, D., De Lange, T., and Halvorsen, P. (2019). “Resunet++: an advanced architecture for medical image segmentation,” in IEEE International Symposium on Multimedia (ISM) (San Diego, CA: IEEE), 225–230. doi: 10.1109/ISM46123.2019.00049

Crossref Full Text | Google Scholar

Kaur, A., Kaur, L., and Singh, A. (2024). Deepconn: patch-wise deep convolutional neural networks for the segmentation of multiple sclerosis brain lesions. Multimed. Tools Appl. 83, 24401–24433. doi: 10.1007/s11042-023-16292-y

Crossref Full Text | Google Scholar

Knight, J., and Khademi, A. (2016). “MS lesion segmentation using flair MRI only,” in Proceedings of the 1st MICCAI Challenge on Multiple Sclerosis Lesions Segmentation Challenge Using a Data Management and Processing Infrastructure-MICCAI-MSSEG (Athens: MICCAI Society), 21–28.

Google Scholar

Lao, Z., Shen, D., Liu, D., Jawad, A. F., Melhem, E. R., Launer, L. J., et al. (2008). Computer-assisted segmentation of white matter lesions in 3D MR images using support vector machine. Acad. Radiol. 15, 300–313. doi: 10.1016/j.acra.2007.10.012

PubMed Abstract | Crossref Full Text | Google Scholar

Long, J., Shelhamer, E., and Darrell, T. (2015). “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Boston, MA: IEEE), 3431–3440. doi: 10.1109/CVPR.2015.7298965

PubMed Abstract | Crossref Full Text | Google Scholar

Mahbod, A., Wang, C., and Smedby, O. (2016). “Automatic multiple sclerosis lesion segmentation using hybrid artificial neural networks,” in Proceedings of the 1st MICCAI Challenge on Multiple Sclerosis Lesions Segmentation Challenge Using a Data Management and Processing Infrastructure-MICCAIMSSEG (Athens: MICCAI Society), 29–36.

Google Scholar

Manjón, J. V., and Coupé, P. (2015). volBrain: an online MRI brain volumetry system. Organ. Hum. Brain Mapp. 15:30. doi: 10.3389/fninf.2016.00030

PubMed Abstract | Crossref Full Text | Google Scholar

McKinley, R., Wepfer, R., Aschwanden, F., Grunder, L., Muri, R., Rummel, C., et al. (2021). Simultaneous lesion and brain segmentation in multiple sclerosis using deep neural networks. Sci. Rep. 11:1087. doi: 10.1038/s41598-020-79925-4

PubMed Abstract | Crossref Full Text | Google Scholar

Milletari, F., Navab, N., and Ahmadi, S. A. (2016). “V-net: fully convolutional neural networks for volumetric medical image segmentation,” in 2016 Fourth International Conference on 3D Vision (3DV) (Stanford, CA: IEEE), 565–571. doi: 10.1109/3DV.2016.79

Crossref Full Text | Google Scholar

Muslim, A. M., Mashohor, S., Gawwam, G. A., Mahmud, R., Hanafi, M. B., Alnuaimi, O., et al. (2022). Brain MRI dataset of multiple sclerosis with consensus manual lesion segmentation and patient meta information. Data Brief 42:108139. doi: 10.1016/j.dib.2022.108139

PubMed Abstract | Crossref Full Text | Google Scholar

Nouman, M., Mabrok, M., and Rashed, E. A. (2024). Neuro-transunet: segmentation of stroke lesion in MRI using transformers. arXiv [preprint]. arXiv:2406.06017. doi: 10.48550/arXiv.2406.06017

Crossref Full Text | Google Scholar

Oktay, O., Schlemper, J., Le Folgoc, L., Lee, M., Heinrich, M., Misawa, K., et al. (2018). Attention u-net: learning where to look for the pancreas. arXiv [preprint]. arXiv:1804.03999. doi: 10.48550/arXiv:1804.03999

Crossref Full Text | Google Scholar

Olsson, T., Barcellos, L. F., and Alfredsson, L. (2017). Interactions between genetic, lifestyle and environmental risk factors for multiple sclerosis. Nat. Rev. Neurol. 13, 25–36. doi: 10.1038/nrneurol.2016.187

PubMed Abstract | Crossref Full Text | Google Scholar

Rachmadi, M. F. (2020). Limited one-time sampling irregularity map (LOTS-IM) for automatic unsupervised assessment of white matter hyperintensities and multiple sclerosis lesions in structural brain magnetic resonance images. Comput. Med. Imaging Graph 79:101685. doi: 10.1016/j.compmedimag.2019.101685

PubMed Abstract | Crossref Full Text | Google Scholar

Ronneberger, O., Fischer, P., and Brox, T. (2015). “U-NET: convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention (MICCAI) (Cham: Springer), 234–241. doi: 10.1007/978-3-319-24574-4_28

Crossref Full Text | Google Scholar

Sajja, B. R., Datta, S., He, R., Mehta, M., Gupta, R. K., Wolinsky, J. S., et al. (2006). Unified approach for multiple sclerosis lesion segmentation on brain MRI. Ann. Biomed. Eng. 34, 142–151. doi: 10.1007/s10439-005-9009-0

PubMed Abstract | Crossref Full Text | Google Scholar

Tustison, N. J., Avants, B. B., Cook, P. A., Zheng, Y., Egan, A., Yushkevich, P. A., et al. (2010). N4itk: Improved n3 bias correction. IEEE Trans. Med. Imaging 29, 1310–1320. doi: 10.1109/TMI.2010.2046908

PubMed Abstract | Crossref Full Text | Google Scholar

Valverde, S., Cabezas, M., Roura, A., González-Villá, J., Pareto, D., Vilanova, L., et al. (2017a). Improving automated multiple sclerosis lesion segmentation with a cascaded 3d convolutional neural network approach. NeuroImage 155, 159–168. doi: 10.1016/j.neuroimage.2017.04.034

PubMed Abstract | Crossref Full Text | Google Scholar

Valverde, S., Oliver, A., Roura, E., González-Villá, S., Pareto, D., Vilanova, J. C., et al. (2017b). Automated tissue segmentation of MR brain images in the presence of white matter lesions. Med. Image Anal. 35, 446–457. doi: 10.1016/j.media.2016.08.014

PubMed Abstract | Crossref Full Text | Google Scholar

Vera-Olmos, F., Melero, H., and Malpica, N. (2016). “Random forest for multiple sclerosis lesion segmentation,” in Proceedings of the 1st MICCAI Challenge on Multiple Sclerosis Lesions Segmentation Challenge Using a Data Management and Processing Infrastructure-MICCAI-MSSEG (Athens), 81–86.

Google Scholar

Zhang, Y., Brady, M., and Smith, S. (2001). Segmentation of brain MR images through a hidden markov random field model and the expectation-maximization algorithm. IEEE Trans. Med. Imaging 20, 45–57. doi: 10.1109/42.906424

PubMed Abstract | Crossref Full Text | Google Scholar

Zhang, Z., Liu, Q., and Wang, Y. (2018). Resunet: a deep learning framework for medical image segmentation. arXiv [preprint]. arXiv:1811.07064. doi: 10.48550/arXiv.1811.07064

Crossref Full Text | Google Scholar

Zivadinov, R., Bakshi, R., and Dwyer, M. G. (2008). MRI assessment of gray and white matter injury in multiple sclerosis. Neurotherapeutics 5, 589–602.

Google Scholar

Keywords: atrous spatial pyramid pooling, CNN, deep learning, double-headed attention, multiple sclerosis

Citation: Jain S, Rajpal N and Soni PK (2026) MS-DASPNet: Multiple Sclerosis lesion segmentation from brain MRI using dual attention and spatial pyramid pooling with transfer learning. Front. Comput. Neurosci. 19:1713766. doi: 10.3389/fncom.2025.1713766

Received: 26 September 2025; Revised: 18 December 2025;
Accepted: 23 December 2025; Published: 29 January 2026.

Edited by:

Mohd Dilshad Ansari, SRM University (Delhi-NCR), India

Reviewed by:

Yuncong Ma, University of Minnesota, United States
Nada Haj Messaoud, University of Monastir, Tunisia

Copyright © 2026 Jain, Rajpal and Soni. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Shikha Jain, c2hpa2hhLjE1MzE2NDkwMDE5QGlwdS5hYy5pbg==; Pramod Kumar Soni, cHJhbW9kLnNvbmlAamFpcHVyLm1hbmlwYWwuZWR1

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.