- 1Department of Information Technology, College of Computer and Information Sciences, Majmaah University, Al Majmaah, Saudi Arabia
- 2Department of Computer Science, College of Engineering and Information Technology, Onaizah Colleges, Qassim, Saudi Arabia
- 3Department of Economics, Engineering, Society and Business Organization (DEIM), University of Tuscia, Viterbo, Italy
- 4Department of Mathematics and Informatics, Azerbaijan University, Baku, Azerbaijan
- 5Department of Computer Science, College of Computer and Information Sciences, Majmaah University, Al Majmaah, Saudi Arabia
The accurate detection of Alzheimer's disease (AD), a progressive and irreversible neurodegenerative disorder, remains a critical challenge in clinical neuroscience. The research aims to develop an advanced multimodal image fusion model for the accurate detection of AD using positron emission tomography (PET) and magnetic resonance imaging (MRI) techniques. The proposed method leverages structural MRI and functional 18-fluorodeoxyglucose PET (FDG-PET) information derived from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). After preprocessing, including Gaussian filtering, skull stripping, and intensity normalization, voxel-based morphometry (VBM) is applied to extract gray matter (GM) features relevant to AD progression. A GM mask generated from MRI is used to isolate corresponding metabolic activity in the PET scans. These features are then integrated using a mask-coding strategy to construct a unified representation that captures both anatomical and functional characteristics. For classification, the model introduces a Glowworm Swarm-Optimized Spatial Multimodal Attention-Enriched Convolutional Neural Network (GWS-SMAtt-ECNN), where the optimization enhances both feature selection and network parameter tuning. The Python was implemented, and the result demonstrates that the proposed multimodal image fusion strategy outperforms traditional unimodal and basic fusion approaches in terms of F1-score (94.22%), recall (96.73%), and accuracy (98.70%). These results highlight the therapeutic usefulness of the suggested improved fusion architecture in facilitating immediate and accurate AD detection by MRI and PET.
1 Introduction
Alzheimer’s disease (AD) is the most prevalent cause of dementia, leading to 60%–80% of cases globally. It is a degenerative neurological illness. AD is characterized by memory loss, cognitive dysfunction, and changes in behavior, and it increases the strain on caregivers and healthcare systems and has a detrimental impact on patients' quality of life (QoL) (1). Early and accurate diagnosis of Alzheimer's is vital to apply effective interventions that can help improve the prognosis for the patient. Conventional diagnostic approaches to AD rely heavily on clinical interviews, memory tests, and behavioral observations (2). These assessments can be highly biased and can be unable to identify the disease at its earliest stages, when intervention can be most effective. By offering targeted, quantifiable, and non-invasive evaluations of brain structure and function, neuroimaging techniques such as MRI and positron emission tomography (PET) have completely changed the diagnosis of AD. MRI can visualize anatomical changes, such as hippocampal atrophy and cortical diminishing, as well as some functional changes (3). PET can show metabolic activity and markers of protein deposition (e.g., amyloid-beta and tau). These imaging techniques have the possibility to significantly increase the accuracy of diagnoses, enhance the assessment of disease progression, and help track how well treatments are functioning. MRI is a crucial imaging technique to appropriately diagnose and track symptoms in individuals (4). AD is studied using imaging techniques due to its capacity to generate fine-grained images of the brain's structure. The anatomical alterations correlated with AD, including ventricular enlargement, cortical thinning, and hippocampus atrophy, which impair memory and cognitive function, can be more effectively detected by MRI. Additionally, MRI modalities improve the utility for diagnosis, such as Diffusion Tensor Imaging (DTI), which assesses white matter (WM) integrity to show disruptions in neural connectivity, or by using functional MRI (fMRI) (5, 6). Techniques that measure brain activity through the use of tasks have shown functional deficits even in the absence of explicit structural deficits, supporting early and precise diagnosis. PET imaging, especially with radiotracers like FDG and amyloid-binding radiotracers, provides insight regarding the metabolism and molecular processes in the brain (7). FDG-PET shows areas of reduced glucose metabolism in regions of the brain that show pathology, while amyloid and tau PET imaging provide a visual representation of areas of brain pathologies with the accumulation of pathological proteins present in the brain (8). Early in the development of the disease, even before clinical symptoms appear, it can be visualized. The fusion of MRI and PET images produces a full image of structural and functional alterations in brain function. Multimodal imaging also enhances diagnostic accuracy for distinguishing AD from other dementias and provides developing therapy tailored to individual patients. With developments in approaches, integration of MRI and PET data allows for increasing success in automated and early detection of degenerative dementias such as AD (9, 10). The accurate early detection of AD remains very difficult and can remain a challenge due to the complexity of AD pathology, data heterogeneity, and limited treatment options. Conventional techniques are not accurate enough, and innovative multimodal techniques, utilizing MRI and PET imaging, DL, and multimodal feature fusion, are more accurate; there remain additional challenges related to robust, scalable, and comprehensible structures to increase the precision of illness advancement prediction and diagnosis. The goal of the research is to create a sophisticated multimodal image fusion model that integrates MRI and PET imaging to improve the specificity of AD detection. By gaining both structural and functional brain abnormalities, the model aims to enable more informative diagnosis and ultimately improve clinical decision-making through deep learning (DL)-powered analysis.
1.1 Key contribution
• An advanced image fusion model that integrates structural MRI and functional FDG-PET imaging for comprehensive AD analysis
• Applied voxel-based morphometry (VBM) to extract GM features from MRI, which are used to guide the fusion process and isolate consistent metabolic movement in PET images
• Considered a GWS-SMAtt-ECNN to enhance feature selection and optimize neural network parameters
• Highlighted the potential of the fusion approach to support initial and exact analysis of AD, aiding clinical decision-making
The structure of this investigation is systematically organized for clarity and coherence. Section 1 covers the overview of research in the introduction section. A comprehensive examination of the literature is given in Section 2, with an emphasis on prior investigations and research gaps. In Section 3, the suggested methodology is described in depth, including the methods and processes used. Results and implications are discussed in Section 4. A description of the main conclusions and contributions is provided in Section 5.
2 Related works
AD is a neurological condition that impairs cognition and causes memory deterioration. To detect initial AD, plaques and tau proteins are essential for early diagnosis (11). Using depth-wise separable convolution blocks, mixed skip connections, and sharing weight convolution blocks, this middle-fusion multimodal approach is used for early AD diagnosis. A unique region-of-interest extraction technique for impacted regions and the complete ADNI series is used to assess the system. There are no effective treatments for AD, and patients frequently forget the information that they have learned (12). Neuroimaging and DL algorithms learned on multimodal images were combined in computer-assisted diagnosis (CAD). By concentrating on the gray matter of the brain, an image-based multimodal combination technique was created that increases the accuracy of diagnoses. The method outperforms innovative techniques for AD detection. AD is a neurological disorder that results in cognitive impairment, memory loss, and brain atrophy (13). MRI neuroimaging and a 3D CNN were used in a multimodal image fusion technique to forecast the course of AD. Combining high-dimensional MRI features from accessible sources allows the approach to predict AD progression with an accuracy range of 88.7%–99%. When applied to voxel characteristics derived from MRI, the method performs better than baseline techniques. The proper identification of AD can be difficult, even though it represents a serious health risk (14). Early AD can be detected with the use of multimodal neuroimaging input, such as MRI and PET. However, based on the heterogeneity, these data need appropriate properties. A multimodal fusion-based method analyzes data using transfer learning and the inverse discrete wavelet transform (IDWT). The simulation achieved an accuracy of 81.25% for MRI data and 93.75% for PET data.
They suggested a machine learning (ML) framework for classifying AD patients by combining MRI and PET modalities (15). The approach extracts complementary data and insights by utilizing Same-Subject-Modalities-Interactions (SSMI). The greatest characteristics were employed for categorization, and the SSMI relation is produced from MRI and PET. The outcomes demonstrate unique methods and strong performance metrics, approaching those of single-modality classification tasks. The region analyzed showed a strong correlation with established AD biomarkers. To increase the precision of diagnosing AD and subjective memory complaints (SMC), a portable network was identified (16). To locate discriminative brain regions, the multiscale long-range receptive field is adaptively recalibrated. To identify AD and SMC, it separates the position as well as the magnitude of correlations between 18-FDG-PET and MRI. Making use of FDG-PET and MRI, the approach is the DL investigation technique for diagnosing SMC with FDG-PET, achieving 97.67% accuracy in AD diagnosis assessments. A thorough method for detecting AD, a neurodegenerative brain disease, covers feature extraction procedures, fusion techniques, scalability, and ML approaches for creating an ML-based AD diagnosis system (17). Additionally, ML workflows were compared using thematic analysis to find possible diagnostic solutions. The development of early diagnosis technologies is advanced by the investigation. Multimodal brain imaging is used to diagnose AD utilizing DL algorithms. The findings demonstrate the ability of CNN and transformer-based models to evaluate multimodal neuroimaging data and identify patterns and characteristics linked to AD (18). Several imaging modalities together increase diagnostic precision and forecast the course of a disease. Despite obstacles such as data heterogeneity, small sample sizes, and low generalizability, the neurodiagnostics for AD continue to be a potential research area. The research uses DL architecture with a combination of MRI and PET scans to propose an approach to diagnosis for AD (19). For categorization, the framework employs an Ensemble Deep Random Vector Functional Link (RVFL) and a 10-layer CNN. The metrics perform better than single-layer feedforward networks and conventional classification systems.
The Dual-3DM3-AD model was developed to enhance early Alzheimer's detection through MRI and PET data fusion (20). It utilized advanced preprocessing, segmentation, and feature aggregation modules for effective analysis. The model achieved 98% accuracy, 97.8% sensitivity, 97.5% specificity, and 98.2% f-measure, outperforming conventional methods. However, it was constrained by limited dataset diversity and high computational intensity.
The proposed model automated glioblastoma segmentation from MRI scans using a multi-view 2D U-Net with conditional random field (CRF) refinement (21). Axial, sagittal, and coronal slice predictions were combined to capture inter-slice information effectively. High Dice scores of 0.77, 0.90, and 0.84 were obtained for ET, WT, and TC, respectively. Some limitations were observed in the generalization capability and computational efficiency.
A DL framework for glioma segmentation and survival prediction using MRI data. A 2D CNN with a majority rule approach was applied to enhance segmentation reliability (22). Radiomic features were analyzed using a 3D replicator neural network to predict patient outcomes. The model achieved high accuracy on the BRATS2020 dataset but was limited by dataset diversity and clinical data omission.
The performance of different CNN models on brain MRI classification for brain tumor and Alzheimer's disease was evaluated based on data complexity (23). Four CNN models (S-CNN, ResNet50, InceptionV3, and Xception) were implemented using stratified five-fold cross-validation with and without principal component analysis (PCA) after preprocessing the data for class balancing and complexity estimation. The results showed that ResNet50 achieved the highest accuracy of 94.3% and F1-score of 0.93, while S-CNN performed the lowest with 87.5% accuracy, and the evaluation was limited by dataset size, heterogeneity, and the exclusion of other CNN variants.
A mobile phone rating classification system to assist consumers in making informed purchasing decisions (24). A novel dataset of over 13,000 reviews was created from Flipkart, and a federated deep neural network (FDNN) using TF-IDF features was employed, involving two clients and one server across three experimental rounds. The approach achieved an accuracy of 96.68% on the aggregated server while preserving customer data privacy. However, the study was limited to Flipkart reviews, included only two clients, and requires further validation on other e-commerce platforms. An Improved Glowworm Swarm Optimization (IGWSO) was proposed to enhance Parkinson's disease prediction using radial basis function networks (25). The approach optimizes network parameters for improved classification, achieving a high accuracy of 95.3%. While effective, the method is limited to Parkinson's disease, and its applicability to other neurodegenerative disorders remains unexplored. Research investigated early Alzheimer's disease detection using the MssNet model with spatial attention to highlight disease-relevant brain regions (26). The model attained 95.6% accuracy and 94.8% F1-score, outperforming standard CNNs. The findings confirmed the benefit of attention mechanisms in AD diagnosis. However, the research was limited to the MRI modality and lacked multimodal validation.
2.1 Research gap
While multimodal fusion techniques using MRI and PET have seen substantial improvement in detecting Alzheimer's disease (15–20), the majority of the current models continue to struggle with challenges related to data heterogeneity, limited sample diversity, and generalization across populations. Additional limitations of current deep learning systems include the lack of adaptive feature selection, the lack of cross-modal interaction modeling, and the limits of precision diagnosis and prediction of neurodegenerative disease (17, 18). Moreover, while high accuracies have been reported on an individual basis (20), the robustness of systems under real-life clinical settings has not been addressed. Therefore, a need has emerged for a generalizable and computationally efficient multimodal framework capable of diagnosing AD early and predicting progression consistently (14, 19). The proposed GWS-SMAtt-ECNN approach solves all the problems by embedding GWS, SMAtt, and ECNN to improve feature extraction, address the heterogeneity of multimodal data, and enhance interpretability, ultimately making it possible to combine MRI and PET imaging techniques to accurately diagnose AD early.
3 Methodology
The methodology integrates preprocessing (Gaussian filtering, skull stripping, and intensity normalization), VBM-based feature extraction to describe multimodal image fusion, and a GWS-SMAtt-ECNN model that fuses MRI and PET data using SMAtt and GWS for robust, accurate AD classification. The process of the suggested multimodal ad identification system is shown in Figure 1.
3.1 Data gathering
The MRI and PET multimodal dataset was gathered from the open-source Kaggle website (https://www.kaggle.com). The dataset comprises structural magnetic resonance imaging (MRI) and functional 18-fluorodeoxyglucose positron emission tomography (FDG-PET) scans obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI), encompassing images of healthy controls, patients with mild cognitive impairment (MCI), and individuals diagnosed with AD. Each category contained a balanced number of samples (AD, 300; MCI, 300; SMC, 300; NC, 300), ensuring that the training and validation subsets were not biased toward any single class. This subset contains 1,200 multimodal image pairs (MRI and FDG-PET) collected from 300 unique participants distributed across three diagnostic groups: cognitively normal (CN) = 100, MCI = 100, and AD = 100. Each participant contributed multiple scan sessions to represent longitudinal variation, allowing both intra-subject and inter-class learning during model training. Figure 2 shows the sample data images in the dataset.
Figure 2. Example of (a) MRI and (b) PET data images from Kaggle.
3.2 Data preprocessing
Preprocessing enhances MRI–PET image quality using Gaussian filtering for noise reduction, skull stripping to isolate brain regions, and Z-score normalization to standardize intensities, enabling robust feature extraction and accurate AD.
3.2.1 Gaussian filtering
Image preprocessing can improve image quality and feature generation, and it is a crucial step in the accurate diagnosis of AD in MRI scans. The Gaussian filter is an effective preprocessing technique for reducing noise and smoothing MRI images, providing a single illustration. The Gaussian filter, which is used to identify noise in MRI images, is superior at reducing image intensity fluctuations. This filter is less effective at removing salt-and-pepper noise. The filter is based on a Gaussian distribution, leading to a weighted average of the pixel values that tends to prefer the center pixel. The probability density function of the Gaussian distribution, used to compute the filter kernel, is represented in Equation 1:
w represents the grayscale intensity of the MRI image, μ is the mean intensity value, and σ is the standard deviation, which determines the degree of smoothing.
3.2.2 Skull stripping
Skull stripping is a preprocessing technique used in MRI analysis that involves eliminating non-brain tissues, such as the eyes, scalp, and skull, to obtain brain images. By removing only the brain, skull stripping improves the accuracy of AD detection, making it a crucial preprocessing step in every investigation and Diagnostic and Statistical Manual of Mental Disorders (DSM) usage, enhancing feature extraction, reducing image noise, and improving image fusion and classification algorithms.
3.2.3 Z-score normalization
The multimodal imaging data (MRI and PET) used in detecting AD are first processed with intensity normalization, specifically using Z-score normalization. This method removes the mean and scales the features to unit variance, allowing MRI and PET outputs, often on different scales, to equally inform the diagnosis. Outliers in the imaging data can significantly affect the mean and standard deviation estimates, potentially hiding true pathological variation. As a result, using Z-score normalization can reduce the impact of distortion and produce a more robust and sensitive model, particularly regarding subtle anatomical and functional changes, using Equation 2:
The audio feature A is characterized by computing its mean and standard deviation .
In detecting AD using PET images, preprocessing procedures such as Gaussian filtering, skull stripping, and intensity normalization improve image quality by enhancing image clarity, removing non-brain tissues, and standardizing voxel intensities, leading to accurate analysis with precise feature extraction to provide better disease classification. Figure 3 shows the preprocessed images of PET.
These preprocessing steps improve image quality, remove non-brain tissue, and normalize intensity values, which increases reproducibility in feature extraction and ultimately improves diagnostic accuracy. Figure 4 shows the preprocessed images of MRI.
3.3 Feature extraction using VBM
VBM allows focal differences in brain structure to be assessed and is essential for evaluating neurodegenerative alterations relevant to AD. In the investigation, VBM can analyze MRI data to identify regionally-patterned atrophy and, in particular, the distribution of GM density changes relevant to the progression of AD. The integration of VBM-based gray matter (GM) extraction and PET metabolic mapping was guided by established neurobiological evidence that GM atrophy and glucose hypometabolism are core indicators of Alzheimer's progression. Structural MRI provides high-resolution visualization of atrophic changes in regions such as the hippocampus and temporal cortex, while FDG-PET captures complementary functional deficits in glucose metabolism. By focusing on these biologically relevant features, the model ensures that the fused representation captures both anatomical and functional markers of AD, providing a strong clinical justification for the proposed feature selection and fusion approach.
The feature extraction images of PET and MRI are shown in Figure 5. The VBM process involves several key steps:
• Spatial normalization: Spatial normalization is an important feature extraction step in VBM for AD detection using MRI and PET scans. It ensures anatomical consistency by transforming brain images to the same stereotactic space. PET scans are mapped to a brain template, co-registered with MRI scans, and normalized for statistical power through group-level comparisons.
• Segmentation: PET scans are employed in the identification of AD, characterized by progressive GM atrophy in the hippocampus and cortex. MRI images are segmented into three categories: GM, cerebrospinal fluid (CSF), and WM. Following segmentation, the images are examined to measure variations in tissue thickness. These images are combined with functional information from the PET scan for a better diagnosis.
• Smoothing: PET and MRI have a critical role in AD detection. The differences in anatomy and smoothing of the segmented GM images were carried out. This maximally reduces the anatomical variability and provides the data with a normal distribution, which is a requirement of parametric statistical testing. PET-based images were used to identify and classify regional atrophy in early stages of AD, providing multimodal information for diagnosis and classification.
• Statistical analysis: Using PET imaging, voxel-wise statistical estimation based on the general linear model (GLM) is employed to detect structural brain alterations associated with AD. A parametric map is produced, which highlights the differences in GM density between AD patients and healthy controls. PET imaging uses structural mapping to consider the features of metabolic activity, which enhances the precision and resilience of AD identification and forecasting. Figures 5a and b illustrate the feature extracted image of PET and MRI, respectively.
GM regions extracted using voxel-based morphometry (VBM) from MRI are known to undergo atrophy in AD, particularly in the hippocampus, temporal lobes, and cortical regions. FDG-PET provides complementary metabolic information, highlighting hypometabolism in these same regions. By combining structural and functional data, the model captures both anatomical degeneration and metabolic deficits, producing a biologically grounded feature set that aligns with well-established AD pathology. Preprocessed MRI scans are analyzed using VBM to extract the GM regions most affected by AD. The GM mask highlights areas such as the hippocampus, entorhinal cortex, and temporal lobes, ensuring only clinically relevant structural regions are considered. The GM mask is applied to spatially co-registered PET scans, isolating voxel-wise metabolic activity within the same clinically significant regions. This ensures that only functionally meaningful PET signals contribute to the final representation.
3.4 Multimodal image fusion
The proposed mask-coding fusion strategy integrates structural MRI and functional FDG-PET to generate a clinically meaningful and technically robust representation for AD detection. Mask-coding fusion: The MRI and masked PET features are combined using a mask-coding strategy. MRI intensities provide anatomical structure, while PET voxel values are modulated by the GM mask. The fused feature map is created via element-wise multiplication and concatenation to maintain spatial correspondence and integrate complementary information from both modalities.
3.5 Proposed technique
The fused multimodal features are input to the GWS-SMAtt-ECNN model. The SMAtt module emphasizes brain regions most relevant to AD, enhancing discriminative feature representation. GWS selects the most informative features, improving classification accuracy, generalization, and robustness against noise or irrelevant information. The GWS-SMAtt-ECNN model combines GWS for optimal selection of features with SMAtt, which gives attention to brain regions in fused MRI and PET for early AD classification. As an ECNN architecture, the model enhances early AD classification by using robust spatial features and adaptively selecting the most informative subspace while improving overall classification accuracy and model generalizability in the classification of multi-stage AD.
3.5.1 SMAtt
The SMAtt in AD detection is to intensify the model's attention toward important brain regions found in MRI and PET images that are most relevant for identifying neurodegeneration. The focused attention helps identify discriminative spatial patterns related to AD biomarkers such as hippocampal atrophy and hypometabolism. To impose SMAtt in a restricted manner, parameters are required, two efficient and parameter-free operations: channel-wise max pooling (CMP) and channel-wise average pooling (CAP). The concatenated multimodal feature map , derived from fused MRI and PET modalities, is processed through CAP and CMP to generate SMAtt maps . The spatially enhanced feature representation is obtained by element-wise multiplication in Equations 3–5:
where
where element-wise multiplication is indicated by . Finally, the following formulation can be used to describe the process that produced the final fusion outcomes, as shown in Equation 6:
where
The size of the input feature maps and the calculated feature maps using Equation 7, , are identical in the particular instance.
3.5.2 ECNN
The standard CNNs are effective at extracting features from a single imaging modality; they have several limitations in the context of Alzheimer's disease detection. This fused feature set is then processed through enhanced fully connected layers with dropout and batch normalization, improving the network's ability to detect subtle anatomical and metabolic changes, thereby enabling more accurate early diagnosis of Alzheimer's disease and its stages.
Equation 8 shows that the standard CNN processes an input image X (e.g., MRI or PET) by extracting features through convolutional and pooling layers, denoted as , which are then passed through fully connected layers with weights W and biases b, and finally transformed by a softmax function to produce the predicted class probabilities .
CNNs often struggle to capture subtle changes in brain structure and metabolism, cannot naturally integrate complementary information from multiple modalities, and treat all spatial regions equally, which may reduce sensitivity to disease-relevant areas. The enriched CNN (ECNN) overcomes these limitations by incorporating structural MRI and functional PET scans as multimodal inputs, extracting modality-specific features through parallel convolutional layers. Spatial attention mechanisms are applied to highlight the most relevant regions in each modality, and the resulting features are fused to form a unified representation. The ECNN can identify small changes in the brain and improve its ability to classify by incorporating structural with functional imaging. Compared with the original ECNNs, this approach enables more accurate early diagnosis of AD and its stages. Figure 6 depicts the typical ECNN's construction. The ECNN architecture employed three convolutional layers with filter sizes of 32, 64, and 128, and a kernel dimension of 3 × 3. Each convolutional block was followed by ReLU activation and 2 × 2 max pooling for spatial downsampling. The feature map dimensions evolved as follows: (128 × 128 × 32) → (64 × 64 × 64) → (32 × 32 × 128), where each stage captured progressively abstracted structural and functional information from fused MRI–PET data. Fully connected layers with 256 and 128 neurons were employed for high-level feature integration, followed by a softmax classifier for four-class AD stage prediction. Dropout values between 0.3 and 0.5 were applied to reduce overfitting. Tunable parameters included convolutional filter weights, bias terms, learning rate, and attention weights from the SMAtt module. Additionally, GWS optimization dynamically adjusted the learning rate (0.0001–0.001), luciferin decay rate (0.4–0.6), and neighborhood range (q₀ = 1.0 – qt = 5.0), ensuring convergence stability.
A variety of convolution filters are used by the ECNN's convolutional layers, frequently referred to as the extraction of features layers, to extract various types of information from the image that was entered. The current layer's convolution filter provides a process of convolution to the input images, resulting in creative feature mappings utilizing the activation function. When using MRI and PET to diagnose AD early, these traits are essential. Equation 9 can be utilized for determining the feature map:
where denotes the map of characteristics inside the network layer, denotes the multilayer filter, incorporates feature maps that connect to in the layer, is a result of that feature map in the layer, symbolizes the 2D convolution procedure, and f(·) is the trait that becomes active. ECNN can effectively recover intricate spatial structures from MRI and PET data because of these aspects, which are necessary for the early identification of AD.
The collected feature maps are divided into smaller planes by the pooling layers. This improves the network's capacity for generalization by simplifying its structure and making it impervious to scale, translation, and other forms of visual distortion. Equation 10 shows how the pooling layer is expressed:
where is the weight and represents the pooling procedure. Extensive pooling techniques include max pooling and mean pooling. Transported to fully linked layers for the diagnosis of AD, which is necessary to comprehend conceptual frameworks and neuronal connections, followed by Equation 11:
where the procedure restricts the result to a specified range and is the actual result and usually indicates how probable it is that the sample belongs to the AD positive group. Here, r is the number of neurons present in the layer before its contents, represents the result of the neuron, is the difference in weight across output neurons and , and a is the bias. The network output for situations involving multiple categories is provided by Equation 12:
The is an establishing function that maps every vector component to the interval , making certain that the total of all the components corresponds to . is the network's response vector for the sample, where the component indicates the possibility that the sample belongs to the class . Here, I is how many categories there are in total, X is the weighted matrix used to merge the output and the previous layer, and Y represents the preceding layer's output vector. The class that has the greatest likelihood of decides the categorization outcome. If the largest value is , the sample is classified as Class i. To differentiate between various stages or types of AD throughout initial detection, this procedure makes robust multi-class prediction possible.
3.5.3 GWS
GWS is an algorithm for medical imaging that chooses feature optimization that finds inspiration from biology. GWS supports accurate classification of AD through MRI and PET and aims to improve classification performance by selecting the most pertinent features. GWS by representing the behavior of GW to enhance the exploration quality of solution spaces and improve diagnostic models by optimally selecting feature subsets: the phases of movement, luciferin update, and neighborhood range update. Figure 7 illustrates the GWS flow process's three primary phases.
The luciferin-informed phase is impacted by the function value determined by the glowworm location. To adjust its luciferin level, each GWS adds its initial luciferin value to a quantity proportionate to the fitness in its present position. A portion of the luciferin value is deducted to replicate luciferin decline over time. Equation 13 is used to update a GWS luciferin level:
where demonstrates GW's luciferin concentration at time s, is the standard for luciferin degradation , z is the augmentation of luciferin continuously, and is the goal function's significance when the agent is j. By using MRI and PET to diagnose AD early, GWS helps identify more useful feature subsets. That attracts neighbors, indicating feature fitness values containing higher luciferin concentrations. GWS can probabilistically subset the luciferin intensity of neighborly glows as it moves toward them, pausing only upon entirely completing the transition. The process is represented in Equation 14:
Equation 13 identifies the GWS neighborhood j at time represents the glowworms' geometric distance from the respective neighbors j and i, while is the variable neighborhood range for glowworm j, constrained with a medical image range such that . For each GWS j, the possibility of approaching a neighbor is given by Equation 15:
The possibility of approaching a neighbor,
where in Equation 16, is the step size and denotes the Euclidean norm. denotes the position of glowworm j in an -dimensional real space at time s. The GWS can converge into an area with more luciferin intensity due to flexibility. Initial identification of AD with MRI and PET, this controlled movement enables exploring the feature space efficiently; the swarm could invent the greatest subsets to increase the accuracy of the diagnosis.
Equation 17 controls the adaptive regional range modification, which is necessary for multimodal landscape detection of numerous elevations. Let be the initial range, . The rule for updates is:
This method assists in establishing a balance between exploitation and exploration when choosing features for early AD prediction. The GWS-SMAtt-ECNN models enhance early AD classification by optimizing feature selection, focusing on brain regions in fused MRI and PET, and improving accuracy and model generalizability.
3.5.4 Hybrid improvement strategy
The proposed GWS-SMAtt-ECNN model combines an enriched CNN for extracting multimodal MRI–PET features, spatial multimodal attention to highlight disease-relevant regions, and glowworm swarm optimization to select optimal features. If attention fails, raw features are retained, and local search is applied when no neighbors exist in GWS (Equation 18):
where is the input MRI image data, denotes the feature extraction function using the enhanced convolutional network (ECNN), and is the fused multimodal feature map obtained by concatenating ECNN-derived features from both MRI and PET modalities.
Equation 19 shows where is the element-wise multiplication operator, represents the CAP applied to to capture global average attention weights, is the CMP applied to to capture prominent feature activations, and is the attention-refined feature map highlighting the most discriminative features from the representation. Figure 8 depicts that the integrated approach enhances both accuracy and efficiency in Alzheimer's disease classification, enabling precise early detection.
Table 1 summarizes the key hyperparameters used during model training and optimization, and Algorithm 1 shows the hybrid pseudocode for GWS-SMAtt-ECNN.
The GWS-SMAtt-ECNN algorithm fuses MRI and PET features using an enhanced CNN and applies spatial-channel attention to highlight important regions. The fused feature map is flattened and passed to a classifier (sigmoid or softmax based on the task). GWS selects the optimal feature subset by simulating swarm behavior. The final classifier is trained on these optimized features to enhance medical image diagnosis accuracy.
The proposed GWS-SMAtt-ECNN model demonstrates notable improvements in diagnostic accuracy and robustness compared with existing multimodal deep learning approaches. By effectively leveraging complementary structural and functional information from fused MRI and FDG-PET scans, the model provides reliable detection across multiple stages of Alzheimer's disease. The integration of the bioinspired GWS optimization algorithm enhances both feature selection and network parameter tuning. High-dimensional and heterogeneous multimodal imaging data often contain redundant or noisy information that can hinder model performance. GWS simulates the behavior of glowworms to explore the feature space based on luciferin intensity (fitness), enabling the selection of the most informative features while discarding irrelevant ones. Additionally, GWS dynamically adjusts network parameters, improving convergence, generalization, and stability during training. By balancing exploration and exploitation in the feature space, GWS mitigates the challenges of data heterogeneity and high dimensionality, contributing to the superior accuracy, robustness, and reliability of the GWS-SMAtt-ECNN model for early Alzheimer's disease detection.
4 Evaluation performance
The results of the model's application are examined in this section, along with comparison analysis and effectiveness assessment (Table 2).
4.1 Confusion matrix
The suggested GWS-SMAtt-ECNN model's accuracy in classification utilizing combined MRI and PET information is displayed in the confusion matrix, clearly demonstrating the model's ability to classify four AD classes with minimal error. The diagonal dominance indicates reasonable predictive ability in terms of accuracy in all classes. Figure 9 shows the performance of the confusion matrix.
The confusion matrix evaluates the model's classification accuracy by analyzing the predicted and true labels across four classifications. It illustrates errors where the model mistakenly assigns certain labels. The results indicate the model has a good classification rate for the majority of classes, but tends to miss the mark when classifying Class 3. It is concluded that further improvement of the model is warranted to make more accurate predictions, in particular for Class 3.
4.2 Accuracy loss
The training and validation curves present the results of the GWS-SMAtt-ECNN model performance shown in Figure 10. The training accuracy continued to increase and reached nearly 99% at the end of 50 epochs, while the validation accuracy was sustained well above 94%, promising that the technique performed well. The loss indicated that during retraining in the early epochs, the loss function experienced significant declines, and convergence was achieved with limited overfitting. The results demonstrate that the model is capable of reliably classifying multimodal MRI–PET data to determine accurate and early detection of AD.
4.3 Precision-recall curve analysis of AD
The precision-recall curve illustrates how well the GWS-SMAtt-ECNN model performed when classifying AD using fused MRI and PET features, as shown in Figure 11. Its capacity to retain a high level of precision at different recall levels is demonstrated by its average precision (AP) score of 0.978. A high precision score indicates limited false positives and demonstrates that the model provides a trustworthy indicator of performance for early diagnosis, which is crucial for appropriate medical attention as AD improves.
4.4 Receiver operating characteristic (ROC) curve
The multi-class ROC curve of the suggested GWS-SMAtt-ECNN model with high discrimination ability for all diagnostic classes. The AUC values ranged from 0.87 to 0.93, with micro- and macro-average AUC scores of 0.90, indicating good overall classification performance (Figure 12).
4.5 Per-class performance evaluation
The per-class sensitivity and specificity values are summarized in Table 3. For the fused MRI–PET model, the average sensitivity and specificity across diagnostic categories were 96.1% and 95.9%, respectively, demonstrating balanced and reliable classification performance.
4.6 Dataset comparison
Table 4 compares the performance of the proposed multimodal model on two datasets for stroke or brain disorder detection using the real-time stroke detection in MRI, CT, and PET image dataset.
Table 4. Performance comparison of proposed and existing datasets for real-time stroke and brain disorder detection.
The proposed model achieved superior results with an accuracy of 98.70%, a recall of 96.73%, and an F1-score of 94.22%, demonstrating its effectiveness in identifying true positives and overall classification reliability. In contrast, on the MRI and PET multimodal dataset (existing dataset), the model achieved a lower performance accuracy of 92.69%, a recall of 90.65%, and an F1-score of 91.20%, highlighting that the proposed dataset with multimodal fusion and optimized feature extraction enhances model performance, likely due to richer structural and functional information and improved preprocessing.
4.7 Paired t-test
Table 5 presents the paired t-test results comparing the performance of three models, QS-LGBM, SA-BFRNN, and QuartzNet across key classification metrics.
Table 5. Paired t-test comparison of QS-LGBM with baseline models on classification performance metrics.
Both SA-BFRNN and QuartzNet demonstrated comparatively lower accuracy (90.13% and 91.45%, respectively). The p-values (0.002 for SA-BFRNN and 0.004 for QuartzNet) were obtained from a paired t-test conducted against QS-LGBM, showing statistically significant performance differences (p < 0.05). Thus, the improvements achieved by QS-LGBM over the other two models are statistically significant, confirming that its superior results are not due to random variation.
4.8 Ablation study
Table 6 shows the performance of different model variants. Incorporating ECNN with SMAtt improves accuracy.
The findings from the ablation studies confirm that each element of VBM-based gray matter masking, SMAtt, and GWS feature optimization offers cumulative contributions toward increased levels of classification accuracy. The full GWS-SMAtt-ECNN model offers an accuracy rate of 98.70%, which is an advantage of 0.87% above the next-best accuracy (which did not include GWS), indicating a combined benefit from utilizing optimization, attention, and anatomically driven fusion.
4.9 Comparative evaluation
The comparative assessment shows that GWS-SMAtt-ECNN outperformed the existing models, Gradient Boosting Machine Deep Neural Networks (GBM-DNN) (27) and FusionNet (28), and the models from (29) (NIN, CA, SA, and CA-SA). It achieved the highest performance in comparison with the concerned metrics. Forecasting appeared to be more accurate using GWS-SMAtt-ECNN's superior efficacy in generating fewer false negatives (FN) and false positives (FP). Table 7 presents a comparison between the proposed and existing approaches.
Accuracy: Accuracy measures the overall performance of the model in correctly classifying both AD and non-AD cases. It represents the proportion of total correct predictions out of all cases evaluated. Higher accuracy indicates better general reliability of the model.
Sensitivity/recall: Recall, or sensitivity, quantifies the model's ability to correctly identify actual AD patients. It calculates the percentage of true AD cases that are correctly detected. A higher recall ensures fewer positive cases are missed in diagnosis.
F1-score: The F1-score balances precision and recall to provide a single performance metric. It is the harmonic mean of precision and recall, reflecting both correctness and completeness of AD detection. A higher F1-score indicates better overall detection performance.
Figure 13 presents a comparative performance analysis of various models for Alzheimer's disease detection using multimodal imaging. GBM-DNN (27) achieves an accuracy of 92.6%, a sensitivity (recall) of 91.5%, and an F1-score of 91.1%, while FusionNet (28) improves slightly with 94% accuracy, 93% recall, and 92.5% F1-score. Models from (29) show lower performance: NIN attains 62.50% accuracy, 61.53% recall, and 62.90% F1-score; CA achieves 68.70%, 63.34%, and 68.10%; SA records 59.17%, 59.10%, and 59.31%; and the combined CA-SA achieves 68.33%, 58.89%, and 66.81%, respectively. The proposed GWS-SMAtt-ECNN model significantly outperforms all these methods, reaching 98.70% accuracy, 96.73% recall, and 94.22% F1-score. This demonstrates the effectiveness of the proposed multimodal fusion approach in capturing both anatomical and functional features for accurate AD detection.
4.10 Discussion
This research developed a multimodal MRI–PET fusion model using GWS-SMAtt-ECNN to improve Alzheimer's disease detection and classification. CNN model evaluation was constrained by small, heterogeneous brain MRI datasets and a limited number of architectures (23); mobile review classification relied solely on Flipkart data with only two clients, reducing generalizability (24); the IGWSO method was applied only to Parkinson's disease, leaving its effectiveness for other neurodegenerative disorders unknown (25); attention-shift studies involved small sample sizes, limiting applicability to real-world AD patients (26); and some CNN evaluations excluded PCA or multimodal features, restricting performance on complex clinical datasets (23). Previous methods, including NIN, CA, SA, and CA-SA (27), were evaluated for comparison. The NIN model, while effective as a baseline CNN, showed limited performance due to its inability to focus on salient features within the brain images. The CA and SA models, which incorporated channel and spatial attention mechanisms, respectively, improved feature emphasis but remained restricted in handling multimodal data, leading to moderate accuracy and suboptimal generalization. The CA-SA combination provided some improvement by integrating both attention mechanisms, yet it still suffered from modality misalignment and computational inefficiency. Similarly, GBM-DNN (20) demonstrated high computational complexity and overfitting due to its large deep architecture, while FusionNet (21) employed complex fusion layers that increased processing cost and was sensitive to variations in image resolution and quality across modalities. Collectively, these methods faced challenges in generalization, handling limited or imbalanced datasets, and achieving robust multimodal integration. The proposed GWS-SMAtt-ECNN addressed these limitations effectively. GWS enabled efficient selection of the most informative features, reducing computational complexity and mitigating overfitting. The SMAtt mechanism ensured proper alignment between MRI and PET features, minimizing modality misalignment and enhancing feature fusion. Consequently, the model demonstrated higher accuracy, precision, recall, and F1-score compared with all baseline methods. The advantages of the proposed approach included robust generalization to limited and heterogeneous datasets, improved computational efficiency relative to deep architectures such as GBM-DNN, and enhanced discriminative capability for Alzheimer's detection by integrating both structural and functional imaging information. The proposed model is limited by dataset size, modality scope (MRI–PET only), computational demands, attention/optimization variability, and lack of validation on external or real-world clinical datasets.
From a practical perspective, the proposed GWS-SMAtt-ECNN demonstrated the potential to facilitate timely and accurate AD diagnosis in clinical settings. While the model achieved high accuracy, further validation using confusion matrices, AUC-ROC analysis, and statistical testing is recommended for reliability. Its enhanced performance suggests utility in early detection, patient monitoring, and treatment planning. Furthermore, the model's ability to handle imbalanced multimodal datasets indicated its adaptability to diverse clinical populations, supporting its eventual deployment in real-world healthcare environments.
5 Conclusion
The combination of MRI and PET scans, a multimodal image integration model that provides sophisticated fusion approaches, improves the diagnostic precision and accuracy of AD diagnosis by providing both structural and functional data on the brain, more efficient feature representation, and greater prospects to identify disease in early stages utilizing DL. The proposed GWS-SMAtt-ECNN technique achieved the highest performance with an accuracy of 98.70%, a recall of 96.73%, and an F1-score of 94.22%. An important drawback to the advanced multimodal image fusing model is that it requires high-quality, co-registered MRI and PET data, which cannot be continuously accessible. Complex models tend to demand a greater computational burden and can be vulnerable to overfitting on small datasets. Future directions involve improving diagnostic accuracy through clinical deployment in real-world settings, adapting to heterogeneous populations, integrating genetic and cognitive biomarkers, and using lightweight models for mobile devices. It can also explore unsupervised learning for early identification of AD in asymptomatic individuals.
Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.
Author contributions
AA: Investigation, Software, Writing – original draft, Conceptualization. MM: Visualization, Data curation, Writing – review & editing, Methodology. CC: Visualization, Investigation, Writing – review & editing, Resources, Project administration, Supervision. AT: Writing – review & editing, Validation, Formal analysis, Project administration.
Funding
The author(s) declare financial support was received for the research and/or publication of this article. The author extends appreciation to the Deanship of Postgraduate Studies and Scientific Research at Majmaah University for funding this research work through the project number (R-2025-2086).
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that no Generative AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence, and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
1. Song J, Zheng J, Li P, Lu X, Zhu G, Shen P. An effective multimodal image fusion method using MRI and PET for Alzheimer’s disease diagnosis. Front Digit Health. (2021) 3:637386. doi: 10.3389/fdgth.2021.637386
2. Dwivedi S, Goel T, Tanveer M, Murugan R, Sharma R. Multimodal fusion-based deep learning network for effective diagnosis of Alzheimer’s disease. IEEE MultiMedia. (2022) 29(2):45–55. doi: 10.1109/MMUL.2022.3156471
3. Odusami M, Maskeliūnas R, Damaševičius R, Misra S. Explainable deep-learning-based diagnosis of Alzheimer’s disease using multimodal input fusion of PET and MRI images. J Med Biol Eng. (2023) 43(3):291–302. doi: 10.1007/s40846-023-00801-3
4. Singamsetty S. Neurofusion: advancing Alzheimer’s diagnosis with deep learning and multimodal feature integration. Int J Educ Appl Sci Res. (2021) 8(1):23–32. doi: 10.5281/zenodo.14889013
5. Castellano G, Esposito A, Lella E, Montanaro G, Vessio G. Automated detection of Alzheimer’s disease: a multi-modal approach with 3D MRI and amyloid PET. Sci Rep. (2024) 14(1):5210. doi: 10.1038/s41598-024-56001-9
6. Odusami M, Maskeliūnas R, Damaševičius R. Optimized convolutional fusion for multimodal neuroimaging in Alzheimer’s disease diagnosis: enhancing data integration and feature extraction. J Pers Med. (2023) 13(10):1496. doi: 10.3390/jpm13101496
7. Shukla A, Tiwari R, Tiwari S. Alzheimer’s disease detection from fused PET and MRI modalities using an ensemble classifier. Mach Learn Knowl Extract. (2023) 5(2):512–38. doi: 10.3390/make5020031
8. Hassan A, Imran A, Yasin AU, Waqas MA, Fazal R. A multimodal approach for Alzheimer’s disease detection and classification using deep learning. J Biomed Inform. (2024) 6(02):441–50. 10.56979
9. Matsushita S, Tatekawa H, Ueda D, Takita H, Horiuchi D, Tsukamoto T, et al. The association of metabolic brain MRI, amyloid PET, and clinical factors: a study of Alzheimer’s disease and normal controls from the open access series of imaging studies dataset. J Magn Reson Imaging. (2024) 59(4):1341–8. doi: 10.1002/jmri.28892
10. Shojaie M, Tabarestani S, Cabrerizo M, DeKosky ST, Vaillancourt DE, Loewenstein D, et al. PET Imaging of tau pathology and amyloid-β, and MRI for Alzheimer’s disease feature fusion and multimodal classification. J Alzheimer’s Dis. (2021) 84(4):1497–514. doi: 10.3233/JAD-210064
11. Kim SK, Duong QA, Gahm JK. Multimodal 3D deep learning for early diagnosis of Alzheimer’s disease. IEEE Access. (2024) 12:46278–89. doi: 10.1109/ACCESS.2024.3381862
12. Díaz-Cadena A, Naranjo Peña I, Lara Gaviláñez H, Sánchez Pazmiño D, Botto-Tobar M. Alzheimer’s disease diagnosis assistance through the use of deep learning and multimodal feature fusion. In: Koundal D, Jain DK, Guo Y, Ashour AS, Zaguia A, editors. Data Analysis for Neurodegenerative Disorders. Cognitive Technologies. Singapore: Springer Nature Singapore (2023). p. 143–64. doi: 10.1007/978-981-99-2154-6_8
13. Ismail WN, Rajeena PPF, Ali MA. Multforad: multimodal mri neuroimaging for Alzheimer’s disease detection based on a 3d convolution model. Electronics (Basel). (2022) 11(23):3893. doi: 10.3390/electronics11233893
14. Odusami M, Maskeliūnas R, Damaševičius R. Pixel-level fusion approach with vision transformer for early detection of Alzheimer’s disease. Electronics (Basel). (2023) 12(5):1218. doi: 10.3390/electronics12051218
15. Guelib B, Zarour K, Hermessi H, Rayene B, Nawres K. Same-subject-modalities-interactions: a novel framework for MRI and PET multi-modality fusion for Alzheimer’s disease classification. IEEE Access. (2023) 11:48715–38. doi: 10.1109/ACCESS.2023.3276722
16. Leng Y, Cui W, Peng Y, Yan C, Cao Y, Yan Z, et al. Multimodal cross-enhanced fusion network for diagnosis of Alzheimer’s disease and subjective memory complaints. Comput Biol Med. (2023) 157:106788. doi: 10.1016/j.compbiomed.2023.106788
17. Sharma S, Mandal PK. A comprehensive report on machine learning-based early detection of Alzheimer’s disease using multi-modal neuroimaging data. ACM Comput Surveys (CSUR). (2022) 55(2):1–44. doi: 10.1145/3492865
18. Raza ML, Hassan ST, Jamil S, Hyder N, Batool K, Walji S, et al. Advancements in deep learning for early diagnosis of Alzheimer’s disease using multimodal neuroimaging: challenges and future directions. Front Neuroinform. (2025) 19:1557177. doi: 10.3389/fninf.2025.1557177
19. Sharma R, Al-Dhaifallah M, Shakoor A. Fusenet: attention-learning based MRI–PET slice fusion for Alzheimer’s diagnosis. Comput Electr Engin. (2025) 127:110556. doi: 10.1016/j.compeleceng.2025.110556
20. Khan AA, Mahendran RK, Perumal K, Faheem M. Dual-3DM 3 AD: mixed transformer-based semantic segmentation and triplet pre-processing for early multi-class Alzheimer’s diagnosis. IEEE Transact Neural Syst Rehabil Engin. (2024) 32:696–707. doi: 10.1109/TNSRE.2024.3357723
21. Mushtaq N, Khan AA, Khan FA, Ali MJ, Shahid MMA. Brain tumor segmentation using a multi-view attention-based ensemble network. Comput Mater Continua. (2022) 72(3):5793–806. doi: 10.32604/cmc.2022.024316
22. Rastogi D, Johri P, Donelli M, Kadry S, Khan AA, Espa G, et al. Deep learning-integrated MRI brain tumor analysis: feature extraction, segmentation, and survival prediction using replicator and volumetric networks. Sci Rep. (2025) 15(1):1437. doi: 10.1038/s41598-024-84386-0
23. Kujur A, Raza Z, Khan AA, Wechtaisong C. Data complexity based evaluation of the model dependence of brain MRI images for classification of brain tumor and Alzheimer’s disease. IEEE Access. (2022) 10:112117–33. doi: 10.1109/ACCESS.2022.3216393
24. Khan AA, Driss M, Boulila W, Sampedro GA, Abbas S, Wechtaisong C. Privacy preserved and decentralized smartphone recommendation system. IEEE Transact Consum Electr. (2023) 70(1):4617–24. doi: 10.1109/TCE.2023.3323406
25. Sivakumar M, Devaki K. Improved glowworm swarm optimization for Parkinson’s disease prediction based on radial basis functions networks. Inform Technol Cont. (2024) 53(2):342–54. doi: 10.5755/j01.itc.53.2.33368
26. Ye J, Pan D, Zeng A, Zhang Y, Chen Q, Liu Y. MssNet: an efficient spatial attention model for early recognition of Alzheimer’s disease. IEEE Trans Emerg Top Comput Intell. (2025) 9(2):1454–68. doi: 10.1109/TETCI.2025.3537942
27. Mohammed S, Malhotra N, Singh A, Awadelkarim AM, Ahmed S, Potharaju S. Novel hybrid intelligence model for early Alzheimer’s diagnosis utilizing multimodal biomarker fusion. Inform Med Unlocked. (2025) 57:101668. doi: 10.1016/j.imu.2025.101668
28. Muksimova S, Umirzakova S, Baltayev J, Cho YI. Multi-modal fusion and longitudinal analysis for Alzheimer’s disease classification using deep learning. Diagnostics. (2025) 15(6):717. doi: 10.3390/diagnostics15060717
Keywords: Alzheimer’s disease (AD), multimodal image fusion, magnetic resonance imaging (MRI), Glowworm Swarm-Optimized Spatial Multimodal Attention-Enriched Convolutional Neural Network (GWS-SMAtt-ECNN), positron emission tomography (PET)
Citation: Ansari AS, Mohammadi MS, Cattani C and Tassaddiq A (2025) An advanced multimodal image fusion model for accurate detection of Alzheimer's disease using MRI and PET. Front. Med. Technol. 7:1699821. doi: 10.3389/fmedt.2025.1699821
Received: 5 September 2025; Revised: 12 October 2025;
Accepted: 5 November 2025;
Published: 2 December 2025.
Edited by:
Chen Xue, Zhejiang University, ChinaReviewed by:
Arfat Ahmad Khan, Khon Kaen University, ThailandK. V. Suma, Ramaiah Institute of Technology, India
Copyright: © 2025 Ansari, Mohammadi, Cattani and Tassaddiq. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Carlo Cattani, Y2F0dGFuaUB1bml0dXMuaXQ=; Asifa Tassaddiq, YS50YXNzYWRkaXFAbXUuZWR1LnNh