- 1University School of Information, Communication & Technology, Guru Gobind Singh Indraprastha University, New Delhi, India
- 2Department of Computer Applications, Manipal University Jaipur, Jaipur, India
Accurate detection and segmentation of multiple sclerosis (MS) lesions in brain Magnetic Resonance Imaging (MRI) is a challenging task due to their small size, irregular shape, and variability in different imaging modalities. Precise segmentation of MS lesions from brain MRI is vital for early diagnosis, disease progression monitoring, and treatment planning. We introduce MS-DASPNet, a Dual Attention Guided Deep Neural Network specifically designed to address the challenges of MS lesion detection, including small lesion sizes, low contrast, and heterogeneous appearance. MS-DASPNet employs a VGG-16-based encoder, an Atrous Spatial Pyramid Pooling (ASPP) bottleneck for multi-scale context learning, and dual attention modules in each skip connection to simultaneously refine spatial details and enhance channel-wise feature representation. Evaluations on four publicly available datasets, namely ISBI-2015, Mendeley, MICCAI-2016, and MICCAI-2021, demonstrate that MS-DASPNet achieves superior Precision, Dice, Sensitivity, and Jaccard scores compared to state-of-the-art methods. MS-DASPNet attains a Dice score of 0.8736 on the MICCAI-2016 dataset and 0.8706 on the MICCAI-2021 dataset, both outperforming existing segmentation techniques, highlighting its robustness and effectiveness in accurate MS lesion segmentation.
1 Introduction
Multiple sclerosis (MS) is a long-term autoimmune disease that targets the central nervous system (CNS), characterized by damage to the myelin sheath, degeneration of nerve fibers, and inflammation within neural tissues (Compston and Coles, 2008). The disease disrupts communication between the brain and body, causing motor, sensory, and cognitive impairments. MS lesions develop in different brain regions, forming sclerosis that appears in multiple locations, thus giving the disease its name, MS (Filippi et al., 2016). While the exact origin of MS remains unclear, genetic predisposition, environmental factors, infections, and immune system dysfunction are believed to contribute to disease onset (Olsson et al., 2017). Magnetic Resonance Imaging (MRI) is the most effective imaging modality for detecting MS lesions, as it provides detailed visualization of white matter abnormalities across multiple sequences, including T1-weighted, T2-weighted, Fluid-Attenuated Inversion Recovery (FLAIR), and Proton Density (PD) scans (Zivadinov et al., 2008). However, accurately segmenting MS lesions remains a difficult task due to variability in lesion appearance, image artifacts, and the complexity of brain anatomy. Various approaches, including traditional techniques, machine learning, and DL methods, have been investigated for MS lesion segmentation. These approaches are generally classified into supervised, unsupervised, and DL-based categories.
Supervised machine learning (ML) techniques require labeled training data, where models learn to distinguish lesions from healthy tissue. Traditional approaches such as thresholding, region growth, and statistical models (e.g., Gaussian Mixture Models and Bayesian classifiers) were initially applied to segmentation (Sajja et al., 2006; Anbeek et al., 2005). Later, advanced classifiers such as Support Vector Machines (SVMs), the Hidden Markov Model, and the expectation-maximization algorithm demonstrated improved lesion detection (Zhang et al., 2001; Lao et al., 2008). More recently, supervised learning models have leveraged hand-crafted features, including intensity histograms, texture features, and spatial priors, to refine segmentation accuracy (Jain et al., 2020). However, supervised models often require extensive manual annotation, which is labor-intensive and prone to inter-rater variability.
Unsupervised algorithms identify hidden patterns in data to facilitate classification and segmentation tasks. Various unsupervised techniques have been explored for MS lesion segmentation. A feature vector-based approach has been utilized to segment MS lesions from skull-stripped MRI images (Akbarpour et al., 2017). Robust partial-volume tissue segmentation, which integrates intensity-based probabilistic and morphological prior maps, incorporating outlier rejection and filling, has also been proposed (Valverde et al., 2017b). Atlason et al. (2019) applied CNNs for tissue and white matter hyperintensity (WMH) segmentation in brain MRI scans. A Euclidean distance-based clustering method has also been employed to detect MS lesions in MRI images (Cetin et al., 2020). An unsupervised approach has been developed to quantitatively assess MS lesion progression to extract brain tissue distortions from MRI scans (Rachmadi et al., 2020). Furthermore, Seg-JDOT, a domain adaptation-based MS lesion segmentation framework, has demonstrated promising results (Ackaouy et al., 2020). However, unsupervised models generally exhibit lower accuracy and often require human intervention to align with domain-specific knowledge.
DL has significantly advanced the extraction of MS lesions by enabling automated feature learning directly from raw MRI scans. Convolutional neural networks (CNNs), especially the UNet architecture and its derivatives, have demonstrated exceptional performance in segmenting MS lesions (Ronneberger et al., 2015; Isensee et al., 2018). Attention-UNet enhances segmentation precision by focusing on lesion regions (Aslani et al., 2019), while DenseUNet and ResUNet improve feature propagation and gradient flow (Oktay et al., 2018; Jha et al., 2019; Long et al., 2015). Fully Convolutional Networks (FCNs) and encoder-decoder architectures have further refined lesion identification (Milletari et al., 2016; Chen et al., 2017). Furthermore, hybrid models incorporating Atrous Spatial Pyramid Pooling (ASPP) and Vision Transformers (ViTs) have demonstrated superior lesion detection capabilities by capturing long-range dependencies and multi-scale contextual information (He et al., 2023; Dosovitskiy et al., 2021; Azad et al., 2024). A hybrid architecture combining the SWin transformer and UNet models is effectively used for the identification of MS lesions (Nouman et al., 2024). Similarly, Davarani et al. (2025) have designed a hybrid architecture of transformers and autoencoders to segment MS lesions. A detailed comparative analysis of various techniques for identifying MS lesions from brain MRI images is presented in Table 1. The performance of all methods presented in Table 1 is evaluated using the Dice-score metric, which eliminates the issue of class imbalance in MS lesion datasets. Despite their success, DL models face challenges such as data scarcity, high computational costs, and overfitting on small training datasets.
Table 1. Overview of some of the well-known techniques of MS lesion segmentation from brain MRI images.
Given the limitations of traditional supervised and unsupervised methods, recent research has focused on designing advanced DL designs to improve MS lesion segmentation. To address the limitations of the existing MS lesion segmentation methods discussed above, an efficient architecture, MS-DASPNet, is designed to extract lesions from brain MRI images by fusing low-level and high-level features through a dual-headed attention mechanism, which incrementally weights each channel to enhance feature representation. The encoder of the proposed model is initialized with VGG-16, which provides a strong hierarchical feature extraction capability due to its deep convolutional layers and pre-trained weights, enabling better generalization and improved feature reuse. Additionally, an ASPP section is integrated into the bottleneck to efficiently model contextual information across multiple spatial scales and improve lesion delineation across different resolutions. By addressing the weight update problem, the network further enhances its feature extraction efficiency. MS-DASPNet is computationally efficient, requiring fewer parameters, while the combination of VGG-16's hierarchical feature learning and ASPP's multi-scale context aggregation leads to substantial improvements in MS lesion delineation performance. The major contributions of the proposed architecture are as follows:
1. In this study, a DL based model, MS-DASPNet, is designed to segment MS lesions from brain MRI images. The proposed method is computationally efficient and enhances multi-scale contextual understanding using dilated convolutions, improving localization and global perception while mitigating spatial information loss.
2. The introduction of a dual-headed attention block in the skip connections of MS-DASPNet enhances feature refinement by enabling the model to focus on both spatial and channel-wise dependencies, leading to improved accuracy and boundary precision.
3. To assess the generalizability and robustness of MS-DASPNet, experiments are conducted on four publicly available datasets (ISBI-2015, Mendeley, MICCAI-2016, and MICCAI-2021), and performance is evaluated using quantitative metrics.
4. The performance of the MS-DASPNet is matched with the existing DL architectures, including UNet, Attention UNet, DenseUNet, and Res-UNet, for MS lesion extraction in brain MRI scans.
The remainder of the paper is structured as follows: The Section 2 outlines the four datasets utilized in this study. The Section 3 describes the architectural design and methodological framework of the proposed approach. The Section 4 details the experimental setup and presents the quantitative and qualitative segmentation outcomes. The Section 5 provides an in-depth evaluation of the proposed model against various state-of-the-art DL frameworks. Finally, the Section 6 summarizes the key findings and outlines potential directions for future work.
2 Dataset description
This section outlines the datasets used for the segmentation of MS lesions from brain MRI scans as presented in Table 2.
2.1 ISBI-2015 dataset
In 2015, the International Symposium on Biomedical Imaging hosted the Longitudinal MS Lesion Segmentation Challenge, providing training and test data for brain MRI images acquired using a 3T Philips MRI scanner (Carass et al., 2017). The dataset includes 3D (NIfTI) brain MRI images from five patients, acquired at four time points across multiple modalities, including T2-weighted, FLAIR, MPRAGE, and proton-density-weighted scans. Although MS lesions are present in all modalities, the FLAIR modality has been chosen for MS lesion segmentation due to its clear visibility, as illustrated in Figure 1.
Figure 1. ISBI-2015 sample brain MRI images in different image modalities when slice number is 100 for the first patient. (a) T2-weighted, (b) MPRAGE, (c) PD, (d) FLAIR.
2.2 Mendeley dataset
The Brain MRI Dataset of MS with Consensus Manual Lesion Segmentation and Patient Meta Information (Muslim et al., 2022) is employed for extracting MS lesions from MRI scans. This dataset includes 3D brain MRI volumes in NIfTI format from 60 patients, acquired using multiple imaging modalities, including T1-weighted, T2-weighted, and FLAIR. Each 3D volume represents a unique patient and varies in spatial dimensions. Although MS lesions are observable across all three modalities, the FLAIR modality is selected for segmentation due to its superior lesion visibility, as demonstrated in Figure 2.
Figure 2. Mendeley dataset sample brain MRI in different image modalities when slice number is 15 for the second patient in the dataset. (a) T1-weighted, (b) T2-weighted. (c) FLAIR.
2.3 MICCAI-2016/MSSEG-2016
This dataset includes three-dimensional (NIfTI) MRI scans in various modalities, including:
1. 3D FLAIR Image
2. 3D T1 Image
3. T2 Image
4. DP Image
5. 3D T1 Gd Image (Post-contrast agent image).
The training set includes 15 images, and the testing set includes 34 images, with their corresponding true labels. After slicing each 3D image, we generated a total of 2,432 images and their corresponding masks. The true lesion masks were provided by seven experts, and a consensus mask was used to evaluate the results. We used preprocessed 3D images for slicing in the FLAIR modality.
The images of the MSSEG-2016 dataset (Commowick et al., 2021a) are shown in Figure 3.
Figure 3. MICCAI-2016 dataset sample brain MRI in different image modalities when slice number is 282 for the third patient in the dataset. (a) T1-weighted. (b) T2-weighted. (c) FLAIR. (d) T1-Gd (gadolinium contrast). (e) DP-proton density.
2.4 MICCAI-2021/MSSEG-2
This dataset comprises MR neuroimaging data from 40 patients, each having undergone 3D FLAIR acquisitions at two distinct time points, with variable intervals between scans. MS lesions were segmented by four experts, and a majority vote was applied voxel-by-voxel to generate the final consensus masks, which serve as the basis for MS lesion segmentation. This study used only second-time-point images for lesion segmentation. The dataset, referred to as MSSEG-2, is publicly available at Commowick et al. (2021b). Representative images of the data set are shown in Figure 4.
Figure 4. MICCAI-2021 dataset sample brain MRI in FLAIR modalities when slice number is 241 for the thirty-fifth patient in the dataset. (a) Time point-1 image. (b) Time point-2 image.
3 Proposed method
This work proposes and evaluates MS-DASPNet, a novel deep-learning framework, on four distinct datasets: ISBI-2015, Mendeley Dataset, MICCAI-2016, and MICCAI-2021. For analysis, pixel-based classification uses DL techniques such as UNet, DenseUNet, Attention UNet, ResUNet, and MS-DASPNet. Firstly, three-dimensional brain MRI scans have been sliced to produce two-dimensional MRI images. Preprocessing includes a sequence of steps for skull removal on the dataset: (i) contrast elongation, (ii) histogram equalization, (iii) Otsu thresholding, and (iv) morphological operations. The flowchart depicting the sequence of steps involved in evaluating the proposed architecture is shown in Figure 5. The architecture of the MS-DASPNet model is presented in Figure 6 and described in Algorithm 1, and the notations used are presented in Table 3.
3.1 Preprocessing
In this step, the three-dimensional brain MRI volumes were sliced to obtain a two-dimensional representation suitable for model training. Each MRI volume and its corresponding ground-truth mask consist of approximately 300–400 slices in the FLAIR sequence. Since the proposed MS-DASPNet operates on 2D inputs, the 3D brain MRI volumes were first decomposed into axial slices, as the axial plane provides superior visualization of MS lesions. To ensure that the selected slices contained meaningful information, only slices exhibiting MS lesions with a minimum lesion size of five pixels in the corresponding ground-truth masks were retained. Furthermore, instead of using all slices from each volume, approximately 65%–70% of the axial slices were utilized for the experiments, as the initial and terminal slices predominantly correspond to peripheral brain regions and generally do not contain visible lesions. This strategy enabled the selection of slices that contained the maximum lesion information. The sliced brain MRI images obtained are affected by noise and require skull removal and extraction of brain tissues by applying the preprocessing. This is achieved by denoising each image using the NL-means algorithm (Coupé et al., 2008), and the volBrain platform (Manjón and Coupé, 2015) is used to extract the brain from the image. Finally, the bias correction is performed using the N4 algorithm (Tustison et al., 2010). The 2D images of all datasets, along with their masks, are resized to 256 × 256.
3.2 Data augmentation
Generally, deep learning models suffer from issues of overfitting due to the small size of the dataset. This issue can be resolved by data augmentation, which will improve the model's generalization ability and increase its robustness to handle complex features. In this work, the following operations are performed to expand the dataset size.
1. Rotation—Rotation has been performed at the angles of 45, 90, and 125 degrees on both the brain MRI image and the mask image.
2. Scaling—Scaling has been performed with a scale factor of 1.5 and 2 on both the images of brain MRI as well as the mask image.
3. Translation—Translation has been performed to translate an image horizontally on both images.
3.3 Overview of the MS-DASPNet architecture
This study proposes an MS-DASPNet structure—A Transfer Learning-based UNet model incorporating ASPP and Dual-Headed Attention, an effective DL design for MS lesion identification in brain MRI. The proposed model builds upon the standard UNet by integrating dual-headed attention mechanisms in skip connections and utilizing a pre-trained VGG-16 encoder for feature extraction. Additionally, ASPP is embedded in the bottleneck layer to capture multi-scale contextual information from the deepest encoder feature map. These enhancements address key limitations of conventional UNet models, including feature loss and poor boundary depiction, which are critical in medical image segmentation tasks. The architectural modifications introduced in our model aim to improve feature representation, spatial attention, and multi-scale feature learning, thereby enhancing segmentation accuracy. Figure 7 illustrates the overall network architecture, which consists of:
1. An encoder initialized with VGG-16, employing pre-trained feature representations for robust hierarchical feature extraction.
2. Dual-headed attention modules integrated within skip connections to enhance feature fusion and suppress irrelevant activations.
3. An Atrous Spatial Pyramid Pooling (ASPP) component embedded in the bottleneck to extract multi-scale contextual features from deep encoder representations.
By incorporating these improvements, the proposed model outperforms the UNet model variants, especially in detecting complex lesion boundaries.
In the preprocessing stage, each 3D MRI volume (I3D) is converted into 2D slices, denoised using NL-means filtering, corrected for intensity inhomogeneity via N4 bias field correction, and resized to 256 × 256. Encoder features (E1–E4) are extracted from a pre-trained VGG-16 network. The ASPP module aggregates multi-scale context using dilation rates 1, 6, 12, 18, followed by 1 × 1 convolution for feature refinement. In the decoder, both channel and spatial attention mechanisms are applied to enhance lesion-relevant regions before skip connections are fused. The final segmentation mask (Ŝ) is produced using a 1 × 1 convolution followed by a sigmoid activation.
3.4 Network architecture
The proposed model follows a U-shaped network architecture, where the encoder is initialized with VGG-16, the ASPP module is integrated into the bottleneck, and double-headed attention is applied to each skip connection to enhance feature extraction and preserve critical details.
3.4.1 Encoder with VGG-16
The encoder in our proposed Transfer Learning-based UNet is initialized with VGG-16, a deep CNN originally designed for image classification, proposed by the Visual Geometry Group (VGG). VGG-16 is defined by its depth, comprising 16 layers, with 13 convolutional layers and three fully connected layers. Instead of training the encoder from scratch, the proposed model used pre-trained weights from the ImageNet dataset, enabling it to efficiently capture hierarchical feature representations. VGG-16 is chosen as the feature extraction network due to its strong ability to capture rich hierarchical features through its deep, pretrained architecture. It provides robust and transferable representations that generalize well across diverse medical imaging modalities, making it highly effective for image segmentation tasks. The VGG-based encoder extracts multi-scale features at different depths, which are later utilized in the UNet decoder through skip connections.
The encoder follows the architecture of VGG-16, which consists of five convolutional blocks, each containing multiple convolutional layers followed by ReLU activation and max pooling for downsampling. Let I∈ℝH×W×C be the input, where H is the height, W is the width, and C is the number of channels of the input MRI image. The feature extraction process through a convolutional layer is described in Equation 1:
where:
• Fl represents the feature map at layer l,
• Wl and bl are the pretrained weights and biases,
• * denotes the convolution operation,
• σ is the ReLU activation function, defined as σ(x) = max(0, x).
The max pooling operation reduces the spatial dimensions by a factor of 2 at each block, as shown in Equation 2:
Since VGG-16 was pre-trained on ImageNet, we freeze its layers during the initial training to retain the learnt feature representations while preventing unnecessary weight updates. Using VGG-16 as the encoder effectively captures both low-level and high-level features, as the model benefits from being pre-trained on the large and diverse ImageNet dataset. The complete feature extraction process through VGG-16 on a sample brain MRI image has been demonstrated in Figure 8.
3.4.2 Atrous Spatial Pyramid Pooling (ASPP)
ASPP is an advanced feature extraction method that captures multi-scale contextual information by employing multiple dilated convolutions with varying dilation rates. Integrating ASPP into the bottleneck of the architecture enhances the model's capacity to aggregate features across multiple receptive field scales while maintaining computational efficiency. The bottleneck layer represents the deepest feature-extraction stage, where spatial resolution is substantially reduced, resulting in the loss of fine-grained details and contextual information. Incorporating ASPP into the bottleneck addresses these challenges by (i) expanding the receptive field without increasing the parameter count, (ii) capturing both local and global contextual features, and (iii) retaining fine details essential for accurate lesion extraction.
ASPP applies parallel atrous convolutions with different dilation rates to the same feature map, enabling multi-scale feature extraction. Let FASPP denote the output feature map after applying ASPP, Fbottleneck be the input feature map from the VGG-16 encoder, and Wi, bi be the weights and biases of the atrous convolution filters at different scales. The convolution with an atrous rate ri is denoted as *ri. The number of atrous convolutions with different dilation rates is represented by n. The feature map computed by ASPP is represented in Equation 3:
ASPP includes a 1 × 1 convolution (r = 1) for local feature extraction. Three atrous convolutions with dilation rates of r = 6, 12, and 18 for multi-scale context. Global average pooling to incorporate global image context. Concatenation of all resulting feature maps followed by a 1 × 1 convolution to fuse the information. The final output of the ASPP module is computed as in Equation 4:
Where F1, F6, F12, and F18 denote the outputs of convolutions with dilation rates of 1, 6, 12, and 18, respectively, FGAP represents the global average pooled feature map, [·] indicates channel-wise concatenation, Wc and bc are the learnable weights and biases for feature fusion, and σ denotes the ReLU activation function.
ASPP enhances multi-scale feature learning across different spatial resolutions, improves boundary distinction, and maintains computational efficiency because it does not significantly increase the number of model parameters.
3.4.3 Double-headed attention (DHA)
In the proposed architecture, the DHA block refines skip connections, effectively integrating channel and spatial attention for enhanced feature representation. Spatial attention highlights important regions in the feature map, whereas channel attention prioritizes the most relevant feature channels. This dual attention mechanism improves feature selection and preservation, improving lesion identification accuracy. The architecture of DHA is described in Figure 9.
The spatial attention mechanism highlights regions of interest by applying Global Average Pooling (GAP) and Global Max Pooling (GMP) across spatial dimensions. The resulting attention map is then used to refine the input feature map, emphasizing spatially significant areas.
Given an input feature map X∈ℝH×W×C, spatial attention is computed as:
where Favg and are the global average pooled and global max pooled feature maps, respectively. These maps are concatenated and passed through a 7 × 7 convolution followed by a sigmoid activation to obtain the spatial attention mask As, as defined in Equation 6:
where σ denotes the sigmoid activation function. The final attention-weighted feature map Xs is obtained by:
where ⊙ represents element-wise multiplication. This mechanism enhances important spatial regions in the feature map, improving the model's focus on relevant areas during extraction.
The channel attention mechanism provides importance to each feature channel using Global Average Pooling (GAP) and Global Max Pooling (GMP) across spatial dimensions.
Given an input (from the spatial attention module), channel descriptors are computed as:
where and represent the globally averaged and maximized pooled channel features, respectively.
These descriptors are concatenated and passed through a two-layer dense network as follows:
where W1 and W2 are learnable weight matrices, b2 is the bias term, and σ denotes the sigmoid activation function. The final channel-wise attention-weighted feature map is obtained as:
where ⊙ denotes element-wise multiplication, applying the attention weights to refine the features along the channel dimension.
After Dual-Headed Attention (DHA) is applied to each skip connection, the enhanced feature maps are concatenated with the upsampled decoder features. This process is represented in Equation 12, where denotes the DHA-enhanced skip connection feature map, and is the upsampled decoder feature map at level i:
This fusion of encoder and decoder features enables richer contextual representation and more precise localization, which is crucial for accurate detection.
3.5 Loss function
Binary Cross-Entropy (BCE) is a loss function and serves as the primary objective for training, quantifying the divergence between the predicted probability map and the actual ground-truth labels. Given a ground truth mask yi and its corresponding predicted output oi, the BCE loss is mathematically expressed as follows:
BCE loss is particularly advantageous for tasks that require classifying each pixel as belonging to either the object of interest (foreground) or the background. Since binary segmentation is inherently a pixel-wise classification problem, BCE is well-suited for optimizing the model's predictive capability. It treats each pixel independently, ensuring the network learns to effectively differentiate between the two classes. By reducing BCE loss during training, the model enhances its accuracy, making it a widely used loss function for binary applications.
3.6 Evaluation metrics
To evaluate performance, different performance parameters have been utilized, as mentioned in Equations 14–17.
• True positive (TP): MS pixel identified as MS
• True negative (TN): non-MS pixel identified as non-MS
• False negative (FN): MS pixel identified as non-MS
• False positive (FP): non-MS pixel identified as MS
4 Experimental results and discussion
This section outlines the experimental settings employed to simulate MS-DASPNet architectures, followed by a comprehensive analysis and discussion of the resulting experimental outcomes.
4.1 Experimental settings
The experiments are conducted on a GPU-equipped system featuring an Intel Core i5 processor at 2.5 GHz clock speed, 16 GB RAM, and an NVIDIA RTX graphics card. The details about the hyperparameters used for the training of the proposed model are presented in Table 4. All datasets are divided into training and testing subsets in a 90:10 ratio to ensure that no patient data overlaps between these sets to avoid data leakage. To prevent overfitting, a dropout rate of 0.2 is incorporated as a regularization technique, while a learning rate of 0.001 is used to ensure stable and efficient convergence. The model used convolutional filters of progressively increasing sizes (16, 32, 64, 128, and 256) to capture multi-scale features, along with a kernel size of 7 for enhanced feature extraction. A batch size of 4 was used, and the BCE loss was effectively selected for the lesion extraction task. The Adam optimizer, with an adaptive learning rate, is used over 100 epochs to fine-tune the model parameters, resulting in robust training and precise outcomes. The details regarding the number of subjects used for conducting experiments on each dataset are provided in Table 5.
4.2 Experimental results and analysis
The proposed experiments are conducted on four datasets, namely ISBI-2015, Mendeley, MICCAI-2016, and MICCAI-2021.
4.2.1 On ISBI-2015 dataset
The segmentation results produced by MS-DASPNet on the ISBI-2015 dataset are illustrated in Figure 10. Specifically, Figures 10a1–a4 displays four sample brain MRI images from the dataset, while Figures 10b1–b4 presents the corresponding ground truth masks used for performance evaluation. Figures 10c1–c4 shows the segmentation outputs generated by the MS-DASPNet architecture. Based on visual inspection of the MS lesions extracted by the proposed method, MS-DASPNet effectively identifies them. The performance is further evaluated using various quantitative parameters, and results are shown in Table 6. The experimental results reveal that the MS-DASPNet model demonstrates robust performance in MS lesion segmentation on the ISBI-2015 dataset, obtaining a Dice Score of 0.8329 and a Jaccard index of 0.733, which reflect strong spatial overlap between the predicted and reference lesion. The precision of 0.9252 indicates high confidence in positive lesion predictions, with minimal over-segmentation. A sensitivity of 0.7804 suggests effective lesion detection, capturing a majority of true lesion voxels, although with some under-segmentation in more diffuse or low-contrast regions. The specificity of 0.9987 and a false positive rate (FPR) of just 0.0013 confirm that the model maintains strong background suppression, crucial for minimizing false detections. These results highlight MS-DASPNet's ability to balance lesion sensitivity and anatomical specificity, making it well-suited for automated quantification of MS lesion load in clinical MRI scans.
Figure 10. Results obtained on ISBI-2015 dataset. (a1–a4) Brain MRI image. (b1–b4) Corresponding ground truth image. (c1–c4) Segmentation obtained using MS-DASPNet.
4.2.2 On MICCAI-2016 dataset
The MS lesions segmented by the MS-DASPNet on the images of the MICCAI-2016 dataset are presented in Figures 11a1–a4 and are visualized in Figures 11c1–c4. The reference MS lesions are shown in Figures 11b1–b4. The experimental results on test data of the MICCAI-2016 dataset for different quantitative parameters are presented in Table 6. On the MICCAI 2016 dataset, MS-DASPNet achieves a Dice score of 0.8736 and a Jaccard index of 0.7982, indicating that the method used has segmented the MS lesion efficiently. The precision of 0.8746 and sensitivity of 0.8834 indicate that the model is performing accurately in detecting MS lesions while minimizing the FPR. The high sensitivity (0.8834) indicates that MS-DASPNet successfully identifies even subtle lesion regions in this particular case, which is critical in medical image analysis, where overlooking affected regions can lead to serious diagnostic implications.
Figure 11. Results obtained on the MICCAI-2016 dataset. (a1–a4) Brain MRI image. (b1–b4) Corresponding ground truth image. (c1–c4) Segmentation obtained using MS-DASPNet.
4.2.3 On MICCAI-2021 dataset
The effectiveness of the proposed MS-DASPNet architecture on the MICCAI-2021 dataset is illustrated in Figure 12 and summarized in Table 6. The segmented MS lesion shown in Figures 12c1–c4 corresponds to images shown in Figure 12, which reveals that MS-DASPNet segments MS lesions from MRI scans efficiently. On the MICCAI 2021 dataset, MS-DASPNet maintains consistently high segmentation accuracy, showcasing its robustness in detecting MS lesions under varying imaging conditions. The model achieves a Dice coefficient of 0.8706 and a Jaccard index of 0.7719, indicating strong agreement with expert-labeled lesion masks. The high specificity (0.9996) and low FPR (0.0004) values highlight its excellent background discrimination capability, which is critical for accurate MS lesion segmentation.
Figure 12. Segmentation results obtained MICCAI-2021 dataset. (a1–a4) Brain MRI image. (b1–b4) Corresponding ground truth image. (c1–c4) Segmentation obtained using MS-DASPNet.
4.2.4 On Mendeley dataset
The experimental results of the proposed MS-DASPNet model are presented in Figure 13 and evaluated on different evaluation metrics as presented in Table 6. In the Mendeley dataset, MS-DASPNet has performed comparatively poorly relative to other datasets, which indicates challenges in accurately identifying MS lesions. The model achieves a Dice coefficient of 0.5076 and a Jaccard index of 0.3519, which reflect its inability to handle the artifacts present in MRI Scans. Despite these limitations, the model maintains a high specificity (0.9985) and a low false-positive rate (0.0015), implying that it effectively distinguishes non-lesion regions and avoids false detections.
Figure 13. Segmentation Results obtained on Mendeley dataset. (a1–a4) Brain MRI image. (b1–b4) Corresponding ground truth image. (c1–c4) Segmentation obtained using MS-DASPNet.
MS-DASPNet was evaluated on four publicly available FLAIR MRI datasets, which are highly efficient at emphasizing hyperintense white matter lesions that are characteristic of MS. The model demonstrated superior performance on MICCAI 2016 and MICCAI 2021, with Dice scores of 0.8736 and 0.8706, respectively, indicating strong spatial overlap with clinically annotated lesion masks. These results prove that the model is efficient in precisely demarcating periventricular and juxtacortical MS lesions, even in the presence of complex morphology. Sensitivities above 0.85 and FPRs below 0.0005 on these datasets indicate that MS-DASPNet can identify fine lesion patterns without generating false alarms in normal-appearing white matter.
On the ISBI 2015 dataset, the model achieved high precision (0.9252) and balanced Dice and recall scores, reflecting a consistent trade-off between detection and false-positive control. It was comparatively lower on the Mendeley dataset, with a Dice score of 0.5076 and a sensitivity of 0.4357, likely due to differences in lesion presentation or poor image quality. Notwithstanding this, the model consistently achieved a specificity of >0.998 in all datasets, affirming its reliability in suppressing non-lesion brain areas. In general, MS-DASPNet holds great promise for clinical utility in automated quantification of MS lesion load, assessment of disease progression, and tracking of treatment response.
5 Comparative analysis
In this section, the performance of the proposed MS-DASPNet is compared with well-known DL based architectures designed for the segmentation of MS lesions as presented in Tables 7–10. The Dice score is selected as a standard parameter for assessing the performance of MS lesion segmentation, as it addresses the issue of class imbalance. Generally, MS lesion regions are small compared to the background, and the accuracy metric can yield false results. However, the Dice score emphasizes the correct classification of lesion regions, avoids issues of background pixels, and is more sensitive to FPR.
Table 7. Comparative evaluation of the proposed method with state-of-the-art methods on ISBI-2015 dataset using dice coefficient.
5.1 On ISBI 2015 dataset
On the ISBI 2015 dataset, the performance of the proposed technique is matched with other DL architectures based on UNet and its variants, like UNet Ronneberger et al., (2015), DenseUNet Cao et al., (2020), Res-UNet Zhang et al., (2018), Attention-Unet Oktay et al., (2018), along with other DL architectures such as transformer-based Azad et al., (2024), patch-based, multi-view CNN Birenbaum and Greenspan, (2016), CNN with three Inception modules Ansari et al., (2021), and cascaded 3D CNN (Valverde et al., 2017a). A detailed comparison of the performance of the proposed method on the ISBI-2015 dataset is presented in Table 7. It has been observed that MS-DASPNet achieves a Dice score of 0.8329, outperforming well-established models such as UNet, Res-UNet, and Attention UNet. It also significantly surpasses CNN-based techniques like those proposed by (Birenbaum and Greenspan 2016), (Ansari et al. 2021), and (Valverde et al. 2017a), whose performance ranges from 0.62 to 0.63, partly due to their reliance on limited contextual understanding. Although on the ISBI-2015 dataset DenseUNet has obtained the highest Dice score of 0.8808, which is 5% higher than the proposed method, this is due to its increased computational complexity arising from the dense architecture. Among all the techniques presented in Table 7, the MS-DASPNet architecture has obtained the second-best and promising results in the segmentation of small, scattered lesions even in areas affected by low tissue contrast, intensity inhomogeneity, or partial volume effects—common challenges in brain MRI segmentation.
5.2 On MICCAI-2016 dataset
On the MICCAI-2016 Dataset, the performance of MS-DASPNet is matched on the Dice score and Sensitivity metric with various DL architectures UNet (Ronneberger et al., 2015), DenseUNet (Cao et al., 2020), Res-UNet Zhang et al., (2018), Attention-Unet Oktay et al., (2018), CNN-based architectures proposed by (Beaumont and Greenspan 2016), (Vera-Olmos et al. 2016), (Mahbod et al. 2016), and patch-wise DNN by (Kaur et al. 2024). Furthermore, the performance is compared with the traditional method based on Expectation-Maximization and graph-cut-based segmentation by (Beaumont et al. 2016b), rule-based techniques proposed by (Beaumont et al. 2016a), and intensified edge-based by (Knight and Khademi 2016). The detailed comparison is presented in the Table 8, and it has been observed that MS-DASPNet has the highest Dice Score of 0.8736, surpassing most benchmark DL models on Dice score, like UNet of 0.8635, DenseUNet of 0.8657, Attention-UNet of 0.8663, and Res-UNet of 0.8692. White's CNN architecture by (Kaur et al. 2024) is a close competitor to MS-DASPNet, achieving a Dice score of 0.8700. However, on the sensitivity metric, Res-UNet has achieved a slightly better result, 0.0009, than MS-DASPNet. MS-DASPNet exhibits a superior Dice Score, indicating a more balanced segmentation output with enhanced overlap accuracy. Whereas the traditional methods by (Beaumont et al. 2016b,a), (Knight and Khademi 2016) and (Vera-Olmos et al. 2016) have very low dice Scores, ranging from 0.5300 to 0.6000, reflecting their limited capability to generalize across lesion variations in FLAIR MRI due to reliance on handcrafted features, simple voxel-wise classifiers, or rule-based refinements.
Table 8. Comparative evaluation of the proposed method with state-of art methods on MICCAI-2016 dataset on dice score and sensitivity metric.
5.3 On MICCAI-2021 dataset
The comparative study of the proposed MS-DASPNet on the MICCAI-2021 Dataset is presented in Table 9 against several state-of-the-art segmentation approaches on dice-score. The MS-DASPNet method obtained the highest Dice Score of 0.8706, outperforming all counterparts. Although Res-UNet has a Dice Score of 0.8628, it showcases strong segmentation capability due to its residual connections. Moreover, DenseUNet, UNet, and Attention-UNet reported moderate Dice Scores of 0.7840, 0.6559, and 0.3914, respectively. Traditional and early methods, such as Basaran et al. (2022) and McKinley et al. (2021), had Dice Scores of 0.5100 and 0.6380, respectively, which reflect their limitations in handling the complex and heterogeneous nature of MS lesions in FLAIR MRI, which often exhibit high inter-patient variability and poor boundary definition. The proposed MS-DASPNet's strong performance on the MICCAI-2021 Dataset demonstrates its ability to effectively segment small, irregularly shaped lesions and maintain boundary integrity. Its integration of multi-scale feature aggregation via ASPP and attention-based contextual filtering enables it to achieve superior overlap with expert annotations.
Table 9. Comparative evaluation of the proposed MS-DASPNet method with state-of-the-art methods on the MICCAI-2021 dataset on dice score.
5.4 Mendeley dataset
The comparative evaluation of the proposed MS-DASPNet is presented in Table 10 against several well-established DL models, UNet, DenseUNet, Attention-UNet, and Res-UNet. The best-performing method on this dataset is Attention-UNet, with a Dice Score of 0.6012, followed by DenseUNet and UNet. These results suggest that attention mechanisms and dense feature propagation provide tangible benefits in improving lesion detection and boundary delineation when applied to the Mendeley dataset. On the other hand, the proposed MS-DASPNet has a Dice score of 0.5076, which is lower than most of the evaluated DL models. While this result may appear suboptimal compared to the model's performance on other datasets (e.g., MICCAI-2016 and MICCAI-2021), it reflects the unique challenges presented by the Mendeley dataset. The lower Dice score on the Mendeley dataset can be attributed to greater variability in lesion appearance and increased lesion complexity. In particular, the dataset contains smaller and more fragmented lesion regions, which make accurate segmentation more challenging and increase sensitivity to minor boundary errors, thereby directly affecting the Dice metric.
Table 10. Comparative evaluation of the proposed MS-DASPNet with DL methods on the Mendeley dataset on dice coefficient.
The MS-DASPNet model integrates a VGG-16-based encoder, ASPP in the bottleneck for multi-level context learning, and dual-headed attention modules in the skip connections. This architectural synergy enables the model to effectively capture both coarse global structures and fine local details, leading to significantly improved segmentation performance. MS-DASPNet demonstrates superior performance on standardized datasets such as MICCAI-2016, MICCAI-2021, and ISBI-2015. Its moderate performance on the Mendeley dataset (Dice: 0.5076) highlights the importance of dataset diversity and domain adaptation in medical image segmentation tasks. The attention maps for selected representative images are illustrated in Figure 14 using the Grad-CAM technique. Specifically, Figures 14a1–a4 depict the original brain MRI slices, while Figures 14b1–b4 present the corresponding Grad-CAM heatmaps highlighting the regions that most strongly influence the model's predictions. Furthermore, subfigures Figures 14c1–c4 show the Grad-CAM overlays superimposed on the original MRI images, providing a clear visual interpretation of the model's focus. These visualizations demonstrate that the network effectively attends to clinically relevant lesion regions, thereby enhancing the interpretability and reliability of the proposed model. Overall, MS-DASPNet provides a well-balanced solution that not only enhances accuracy across various datasets but also maintains computational efficiency, making it a strong candidate for real-world clinical deployment in medical image segmentation tasks. The overall performance of the proposed method on different test datasets is presented as a confusion matrix in Figure 15.
Figure 14. Attention map using MS-DASPNet. (a1–a4) Brain MRI image. (b1–b4) Attention map using GRAD-CAM. (c1–c4) GRAD-CAM overlay.
Figure 15. Confusion matrix of experimental results. (a) ISBI-2015 Dataset. (b) MICCAI-2016 Dataset, (c) On MICCAI-2021 dataset, and (d) Mendeley dataset.
6 Ablation study
An ablation study was performed to assess the robustness and generalization capability of the proposed MS-DASPNet under different training–testing configurations. The model was evaluated using three test split ratios, namely 10%, 15%, and 20%, and the corresponding quantitative results are reported in Tables 6, 11. Across all datasets, MS-DASPNet demonstrates consistent segmentation performance with only marginal variations in Dice and Jaccard scores as the test split changes. This behavior highlights the stability of the proposed architectural components and indicates that the model's performance is not overly sensitive to a particular data partition.
Table 11. Experimental results of MS-DASPNet on different quantitative parameters for 10%, 15%, and 20% test data.
For benchmark datasets such as ISBI 2015 and MICCAI 2016, high Dice scores are maintained across all three test splits, accompanied by consistently high specificity and extremely low false positive rates, underscoring the model's ability to accurately suppress false lesion detections. Although slight performance fluctuations are observed when increasing the test split from 10% to 20%, these variations can be attributed to differences in lesion distribution and subject composition within the test sets, which is common in medical image segmentation tasks involving limited and imbalanced data. Overall, the ablation results confirm that integrating the proposed modules into MS-DASPNet yields robust and reliable lesion segmentation across varying data splits, thereby strengthening the validity of the reported performance claims.
7 Conclusion
This study presents MS-DASPNet, a UNet-based architecture designed for accurate lesion segmentation from brain MRI scans. MS-DASPNet employs a VGG-16 encoder, enabling robust and transferable low-level feature extraction. Including an ASPP bottleneck captures multiscale contextual information crucial for lesions of diverse sizes and shapes. Most notably, the dual-headed attention mechanism in skip connections enables the model to concentrate selectively on relevant spatial and channel-wise features, enhancing segmentation precision without incurring high computational costs. The approach begins with converting 3D brain MRI volumes into 2D slices, followed by skull removal using preprocessing techniques. Various image augmentation techniques were employed to mitigate overfitting caused by the limited dataset size and to enhance model generalization. Experiments were conducted on four publicly available datasets, and segmentation performance was evaluated using four metrics: Dice Score, precision, sensitivity, and Jaccard Index. The proposed MS-DASPNet model achieved the highest Dice Score of 0.8736, outperforming several existing UNet variants. The proposed MS-DASPNet achieved a Dice Score of 0.8736, outperforming several existing U-Net variants. Specifically, compared to the baseline U-Net model (Dice = 0.8635), MS-DASPNet achieved a 1.17% improvement in Dice score, while surpassing Attention U-Net (Dice = 0.8663) by 0.84% and ResU-Net (Dice = 0.8692) by 0.51%. On the MICCAI-2021 dataset, MS-DASPNet achieved a Dice Score of 0.8706, reflecting a 0.82% enhancement over the baseline and consistent improvement across all evaluation metrics.
Comparative analysis with state-of-the-art methods, including both traditional and DL-based techniques, revealed consistent improvements MS-DASPNet offers across different datasets. Specifically, on the MICCAI-2016 dataset, MS-DASPNet outperformed other architectures, including UNet variants, with the highest dice score of 0.8736. Further evaluations on the MICCAI-2021 dataset show that MS-DASPNet outperformed other architectures, including UNet variants, achieving the highest Dice score of 0.8706. In summary, the MS-DASPNet architecture demonstrates robust generalization, flexibility, and accuracy across multiple datasets, positioning it as a highly promising approach for medical image applications. In future work, k-fold cross-validation will be employed to assess further and enhance the model's robustness and generalization across diverse data distributions.
Data availability statement
Publicly available datasets were analyzed in this study. This study uses four types of datasets. (a) The ISBI-2015 dataset is available at: https://smart-stats-tools.org/lesion-challenge-2015 (b) The Mendeley dataset is available at: https://data.mendeley.com/datasets/8bctsm8jz7/1 (c) The MICCAI-2016 dataset is available by request from https://shanoir.irisa.fr/shanoir-ng/study/details/209 (d) The MICCAI-2021 dataset is available by request from https://shanoir.irisa.fr/shanoir-ng/study/details/208.
Author contributions
SJ: Investigation, Writing – original draft, Methodology, Writing – review & editing, Conceptualization, Formal analysis. NR: Writing – original draft, Methodology, Conceptualization, Writing – review & editing, Supervision. PS: Writing – review & editing, Supervision, Writing – original draft, Methodology, Formal analysis, Conceptualization.
Funding
The author(s) declared that financial support was not received for this work and/or its publication.
Acknowledgments
All authors thank Guru Gobind Singh Indraprastha University, Dwarka, Delhi, and Manipal University Jaipur, Rajasthan, India.
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Ackaouy, A., Courty, N., Vallée, E., Commowick, O., Barillot, C., Galassi, F., et al. (2020). Unsupervised domain adaptation with optimal transport in multi-site segmentation of multiple sclerosis lesions from MRI data. Front. Comput. Neurosci. 14:19. doi: 10.3389/fncom.2020.00019
Akbarpour, T., Shamsi, M., Daneshvar, S., and Pooreisa, M. (2017). Unsupervised multimodal magnetic resonance images segmentation and multiple sclerosis lesions extraction based on edge and texture features. Appl. Med. Inform. 39, 30–40.
Anbeek, P., Vincken, K. L., Van Bochove, G. S., and Van Osch, M. J. (2005). Probabilistic segmentation of white matter lesions in MR imaging. NeuroImage 27, 610–620. doi: 10.1016/j.neuroimage.2005.05.046
Ansari, S. U., Javed, K., Qaisar, S. M., Jillani, R., and Haider, U. (2021). Multiple sclerosis lesion segmentation in brain MRI using inception modules embedded in a convolutional neural network. J. Healthc. Eng. 2021:4138137. doi: 10.1155/2021/4138137
Aslani, S., Dayan, M., Storelli, L., Filippi, M., Murino, V., Rocca, M. A., et al. (2019). Multi-branch convolutional neural network for multiple sclerosis lesion segmentation. NeuroImage 196, 1–15. doi: 10.1016/j.neuroimage.2019.03.068
Atlason, H. E., Love, A., Sigurdsson, S., Gudnason, V., and Ellingsen, L. M. (2019). Segae: Unsupervised white matter lesion segmentation from brain MRIs using a CNN autoencoder. NeuroImage: Clin. 24:102085. doi: 10.1016/j.nicl.2019.102085
Azad, R., Kazerouni, A., Heidari, M., Aghdam, E. K., Molaei, A., Jia, Y., et al. (2024). Advances in medical image analysis with vision transformers: a comprehensive review. Med. Image Anal. 91:103000. doi: 10.1016/j.media.2023.103000
Basaran, B. D., Matthews, P. M., and Bai, W. (2022). New lesion segmentation for multiple sclerosis brain images with imaging and lesion-aware augmentation. Front. Neurosci. 16:1007453. doi: 10.3389/fnins.2022.1007453
Beaumont, J., Commowick, O., and Barillot, C. (2016a). “Automatic multiple sclerosis lesion segmentation from intensity-normalized multi-channel MRI,” in Proceedings of the 1st MICCAI Challenge on Multiple Sclerosis Lesions Segmentation Challenge Using a Data Management and Processing Infrastructure-MICCAI-MSSEG (Cham: Springer).
Beaumont, J., Commowick, O., and Barillot, C. (2016b). “Multiple sclerosis lesion segmentation using an automated multimodal graph cut,” in Proceedings of the 1st MICCAI Challenge on Multiple Sclerosis Lesions Segmentation Challenge Using a Data Management and Processing Infrastructure-MICCAI-MSSEG (Athens: MICCAI Society), 1–8.
Beaumont, J., and Greenspan, H. (2016). “Automatic segmentation of multiple sclerosis lesions using deep learning,” in Proc. MICCAI 2016 Workshop on MS Lesion Segmentation, 123–130.
Birenbaum, M., and Greenspan, H. (2016). Longitudinal multiple sclerosis lesion segmentation using multi-view convolutional neural networks. Artif. Intell. Med. 87, 67–75. doi: 10.1007/978-3-319-46976-8_7
Cao, Y., Liu, S., Peng, Y., and Li, J. (2020). Denseunet: densely connected UNet for electron microscopy image segmentation. IET Image Process. 14, 2682–2689. doi: 10.1049/iet-ipr.2019.1527
Carass, A., Roy, S., Jog, A., Cuzzocreo, J. L., Magrath, E., Gherman, A., et al. (2017). Longitudinal multiple sclerosis lesion segmentation: resource and challenge. NeuroImage 148, 77–102. doi: 10.1016/j.neuroimage.2016.12.064
Cetin, O., Seymen, V., and Sakoglu, U. (2020). Multiple sclerosis lesion detection in multimodal MRI using simple clustering-based segmentation and classification. Inform. Med. Unlocked 20:100409. doi: 10.1016/j.imu.2020.100409
Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. (2017). Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848. doi: 10.1109/TPAMI.2017.2699184
Commowick, O., Hadj-Selem, F., Sailer, M., Kuestner, T., Marr, B., Richter, C., et al. (2021b). “MSSEG-2 challenge proceedings: multiple sclerosis new lesions segmentation challenge using a data management and processing infrastructure,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2021, Lecture Notes in Computer Science (Vol. 12981), eds. M. de Bruijne et al. (Cham: Springer). doi: 10.1007/978-3-030-97281-3_24
Commowick, O., Istace, A., Kain, M., Laurent, B., Leray, F., Simon, M., et al. (2021a). Multiple sclerosis lesions segmentation from multiple experts: the MICCAI 2016 challenge dataset. NeuroImage Clin. 32:102830. doi: 10.1016/j.neuroimage.2021.118589
Compston, A., and Coles, A. (2008). Multiple sclerosis. Lancet 372, 1502–1517. doi: 10.1016/S0140-6736(08)61620-7
Coupé, P., Yger, P., Prima, S., Hellier, P., Kervrann, C., Barillot, C., et al. (2008). An optimized blockwise nonlocal means denoising filter for 3-d magnetic resonance images. IEEE Trans. Med. Imaging 27, 425–441. doi: 10.1109/TMI.2007.906087
Davarani, M. N., Darestani, A. A., Cañas, V. G., Harirchian, M. H., Zarei, A., Havadaragh, S. H., et al. (2025). Enhanced segmentation of active and nonactive multiple sclerosis plaques in t1 and flair MRI images using transformer-based encoders. Int. J. Imaging Syst. Technol. 35:e70120. doi: 10.1002/ima.70120
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., et al. (2021). “An image is worth 16x16 words: transformers for image recognition at scale,” in International Conference on Learning Representations (ICLR). OpenReview.net.
Filippi, M., Rocca, M. A., Ciccarelli, O., De Stefano, N., Evangelou, N., Kappos, L., et al. (2016). MRI criteria for the diagnosis of multiple sclerosis: magnims consensus guidelines. Lancet Neurol. 15, 292–303. doi: 10.1016/s1474-4422(15)00393-2
He, K., Gan, C., Li, Z., Rekik, I., Yin, Z., Ji, W., et al. (2023). Transformers in medical image analysis. Intell. Med. 3, 59–78. doi: 10.1016/j.imed.2022.07.002
Isensee, F., Jaeger, P. F., Kohl, S. A. A., Zimmerer, D., Jaeger, P. F., Kohl, S., et al. (2018). NNU-net: a self-adapting framework for u-net-based medical image segmentation. arXiv [preprint]. arXiv:1809.10486. doi: 10.48550/arXiv.1809.10486
Jain, S., Rajpal, N., and Yadav, J. (2020). Multiple Sclerosis Identification Based on Ensemble Machine Learning Technique. Cham: Springer. doi: 10.2139/ssrn.3734806
Jha, D., Riegler, M. A., Johansen, D., Johansen, D., De Lange, T., and Halvorsen, P. (2019). “Resunet++: an advanced architecture for medical image segmentation,” in IEEE International Symposium on Multimedia (ISM) (San Diego, CA: IEEE), 225–230. doi: 10.1109/ISM46123.2019.00049
Kaur, A., Kaur, L., and Singh, A. (2024). Deepconn: patch-wise deep convolutional neural networks for the segmentation of multiple sclerosis brain lesions. Multimed. Tools Appl. 83, 24401–24433. doi: 10.1007/s11042-023-16292-y
Knight, J., and Khademi, A. (2016). “MS lesion segmentation using flair MRI only,” in Proceedings of the 1st MICCAI Challenge on Multiple Sclerosis Lesions Segmentation Challenge Using a Data Management and Processing Infrastructure-MICCAI-MSSEG (Athens: MICCAI Society), 21–28.
Lao, Z., Shen, D., Liu, D., Jawad, A. F., Melhem, E. R., Launer, L. J., et al. (2008). Computer-assisted segmentation of white matter lesions in 3D MR images using support vector machine. Acad. Radiol. 15, 300–313. doi: 10.1016/j.acra.2007.10.012
Long, J., Shelhamer, E., and Darrell, T. (2015). “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Boston, MA: IEEE), 3431–3440. doi: 10.1109/CVPR.2015.7298965
Mahbod, A., Wang, C., and Smedby, O. (2016). “Automatic multiple sclerosis lesion segmentation using hybrid artificial neural networks,” in Proceedings of the 1st MICCAI Challenge on Multiple Sclerosis Lesions Segmentation Challenge Using a Data Management and Processing Infrastructure-MICCAIMSSEG (Athens: MICCAI Society), 29–36.
Manjón, J. V., and Coupé, P. (2015). volBrain: an online MRI brain volumetry system. Organ. Hum. Brain Mapp. 15:30. doi: 10.3389/fninf.2016.00030
McKinley, R., Wepfer, R., Aschwanden, F., Grunder, L., Muri, R., Rummel, C., et al. (2021). Simultaneous lesion and brain segmentation in multiple sclerosis using deep neural networks. Sci. Rep. 11:1087. doi: 10.1038/s41598-020-79925-4
Milletari, F., Navab, N., and Ahmadi, S. A. (2016). “V-net: fully convolutional neural networks for volumetric medical image segmentation,” in 2016 Fourth International Conference on 3D Vision (3DV) (Stanford, CA: IEEE), 565–571. doi: 10.1109/3DV.2016.79
Muslim, A. M., Mashohor, S., Gawwam, G. A., Mahmud, R., Hanafi, M. B., Alnuaimi, O., et al. (2022). Brain MRI dataset of multiple sclerosis with consensus manual lesion segmentation and patient meta information. Data Brief 42:108139. doi: 10.1016/j.dib.2022.108139
Nouman, M., Mabrok, M., and Rashed, E. A. (2024). Neuro-transunet: segmentation of stroke lesion in MRI using transformers. arXiv [preprint]. arXiv:2406.06017. doi: 10.48550/arXiv.2406.06017
Oktay, O., Schlemper, J., Le Folgoc, L., Lee, M., Heinrich, M., Misawa, K., et al. (2018). Attention u-net: learning where to look for the pancreas. arXiv [preprint]. arXiv:1804.03999. doi: 10.48550/arXiv:1804.03999
Olsson, T., Barcellos, L. F., and Alfredsson, L. (2017). Interactions between genetic, lifestyle and environmental risk factors for multiple sclerosis. Nat. Rev. Neurol. 13, 25–36. doi: 10.1038/nrneurol.2016.187
Rachmadi, M. F. (2020). Limited one-time sampling irregularity map (LOTS-IM) for automatic unsupervised assessment of white matter hyperintensities and multiple sclerosis lesions in structural brain magnetic resonance images. Comput. Med. Imaging Graph 79:101685. doi: 10.1016/j.compmedimag.2019.101685
Ronneberger, O., Fischer, P., and Brox, T. (2015). “U-NET: convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention (MICCAI) (Cham: Springer), 234–241. doi: 10.1007/978-3-319-24574-4_28
Sajja, B. R., Datta, S., He, R., Mehta, M., Gupta, R. K., Wolinsky, J. S., et al. (2006). Unified approach for multiple sclerosis lesion segmentation on brain MRI. Ann. Biomed. Eng. 34, 142–151. doi: 10.1007/s10439-005-9009-0
Tustison, N. J., Avants, B. B., Cook, P. A., Zheng, Y., Egan, A., Yushkevich, P. A., et al. (2010). N4itk: Improved n3 bias correction. IEEE Trans. Med. Imaging 29, 1310–1320. doi: 10.1109/TMI.2010.2046908
Valverde, S., Cabezas, M., Roura, A., González-Villá, J., Pareto, D., Vilanova, L., et al. (2017a). Improving automated multiple sclerosis lesion segmentation with a cascaded 3d convolutional neural network approach. NeuroImage 155, 159–168. doi: 10.1016/j.neuroimage.2017.04.034
Valverde, S., Oliver, A., Roura, E., González-Villá, S., Pareto, D., Vilanova, J. C., et al. (2017b). Automated tissue segmentation of MR brain images in the presence of white matter lesions. Med. Image Anal. 35, 446–457. doi: 10.1016/j.media.2016.08.014
Vera-Olmos, F., Melero, H., and Malpica, N. (2016). “Random forest for multiple sclerosis lesion segmentation,” in Proceedings of the 1st MICCAI Challenge on Multiple Sclerosis Lesions Segmentation Challenge Using a Data Management and Processing Infrastructure-MICCAI-MSSEG (Athens), 81–86.
Zhang, Y., Brady, M., and Smith, S. (2001). Segmentation of brain MR images through a hidden markov random field model and the expectation-maximization algorithm. IEEE Trans. Med. Imaging 20, 45–57. doi: 10.1109/42.906424
Zhang, Z., Liu, Q., and Wang, Y. (2018). Resunet: a deep learning framework for medical image segmentation. arXiv [preprint]. arXiv:1811.07064. doi: 10.48550/arXiv.1811.07064
Keywords: atrous spatial pyramid pooling, CNN, deep learning, double-headed attention, multiple sclerosis
Citation: Jain S, Rajpal N and Soni PK (2026) MS-DASPNet: Multiple Sclerosis lesion segmentation from brain MRI using dual attention and spatial pyramid pooling with transfer learning. Front. Comput. Neurosci. 19:1713766. doi: 10.3389/fncom.2025.1713766
Received: 26 September 2025; Revised: 18 December 2025;
Accepted: 23 December 2025; Published: 29 January 2026.
Edited by:
Mohd Dilshad Ansari, SRM University (Delhi-NCR), IndiaReviewed by:
Yuncong Ma, University of Minnesota, United StatesNada Haj Messaoud, University of Monastir, Tunisia
Copyright © 2026 Jain, Rajpal and Soni. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Shikha Jain, c2hpa2hhLjE1MzE2NDkwMDE5QGlwdS5hYy5pbg==; Pramod Kumar Soni, cHJhbW9kLnNvbmlAamFpcHVyLm1hbmlwYWwuZWR1
Navin Rajpal1