- 1Department of IT, Indian Institute of Information Technology, Allahabad, Uttar Pradesh, India
- 2Senior Member International Electrical Electronic Engineering (SM IEEE), Indian Institute of Information Technology, Allahabad, Uttar Pradesh, India
- 3School of Computer Science, Faculty of Engineering and Technology, Villa College, Malé, Maldives
The growing dependence on artificial intelligence (AI) in healthcare has significantly advanced the detection and diagnosis of viral diseases. However, existing AI models encounter key obstacles such as data privacy concerns, limited interpretability, poor generalization, and overfitting, which restrict their practical application and broader adoption. This research tackles these issues by introducing an integrated framework that combines Generative AI, Vision Transformers, Explainable AI (XAI), and Federated Learning (FL) to improve diagnostic accuracy and safeguard data privacy. By utilizing Generative AI, the framework produces synthetic datasets that supplement limited medical data and bolster model resilience. Vision Transformers enhance the precision and efficiency of image-based disease detection. Explainable AI fosters transparency, ensuring that deep learning models’ decisions are clear and reliable for healthcare practitioners. Federated Learning facilitates decentralized model training, maintaining patient privacy while enabling collaborative learning across institutions. Experimental findings show that this framework enhances diagnostic accuracy in viral diseases, including COVID-19, while addressing privacy concerns and improving the interpretability of AI systems. This integrated approach offers a secure, transparent, and scalable solution to the critical challenges in AI-driven healthcare, providing real-time, effective disease detection and analysis.
1 Introduction
1.1 Background and motivation
Artificial intelligence (AI) and machine learning (ML) have had a profound impact on healthcare, especially in disease detection, diagnosis, and treatment (1, 2). Deep learning-based AI systems have been particularly successful in medical imaging and disease prediction (3). Despite these advances, several critical challenges hinder the widespread adoption of AI in real-world healthcare applications, such as issues related to diagnostic accuracy, data privacy, and model interpretability (4). These limitations need to be addressed for AI to reach its full potential in healthcare. The recent surge in viral outbreaks, particularly COVID-19, has emphasized the need for scalable and accurate AI-driven diagnostic systems. These systems must operate efficiently in real-time, without compromising patient privacy (5). However, current AI models face hurdles such as overfitting, difficulty in generalizing to diverse datasets, and the black-box nature of their decision-making processes, which limits transparency (6). These shortcomings highlight the importance of developing AI models that not only offer precision but also meet ethical and privacy standards (7). In response to these challenges, this research introduces an integrated framework that leverages Generative AI, Vision Transformers, Explainable AI (XAI), and Federated Learning (FL). The proposed approach enhances diagnostic accuracy, improves data privacy, and ensures model interpretability. By harnessing the unique strengths of these technologies, this framework aims to overcome the limitations of existing AI models in healthcare and establish a reliable, secure, and transparent AI system for diagnosing viral diseases (8).
1.2 Research gap
While artificial intelligence has made meaningful advances in healthcare, especially in disease diagnosis, several challenges remain—particularly when it comes to accurately detecting and tracking viral diseases in real-world clinical environments. One major problem is overfitting, which occurs because many viral disease datasets—like those for COVID-19, Monkeypox, or newer strains of influenza—are often small, imbalanced, and collected from limited populations. This means that models trained on such data may perform well during development but fail to deliver consistent results when tested on data from different hospitals, regions, or patient groups. For example, a CNN trained solely on COVID-19 X-rays from one hospital may not correctly identify pneumonia cases from another location with different imaging protocols. In addition, generalization becomes difficult because of the natural variability in viral infections—patients may show different symptoms, imaging results, and biomarker readings depending on the stage and severity of the disease. Traditional deep learning models often struggle to capture this variability, reducing their reliability, especially in early outbreak scenarios or among patients with unusual presentations.
Another concern is that most AI systems still rely on centralized data storage and training, which not only raises ethical and legal concerns about patient privacy but also limits collaboration between institutions. Moreover, many of these AI models work like a black box, offering little to no explanation for their predictions—something that makes doctors and healthcare providers hesitant to fully trust or adopt them. To overcome these gaps, our research introduces an integrated AI framework that:
1. Reduces overfitting by using Generative Adversarial Networks (GANs) to generate realistic data for underrepresented disease categories;
2. Improves generalization using Vision Transformers, which can capture complex patterns in medical images more effectively than conventional CNNs;
3. Protects patient data through Federated Learning, allowing different hospitals to train models collaboratively without sharing sensitive records;
4. And ensures transparency by incorporating Explainable AI tools like LIME and SHAP to make AI predictions interpretable and clinically actionable.
This comprehensive solution is designed to directly tackle the shortcomings of existing systems in viral disease diagnosis—providing a more accurate, secure, and scalable AI-driven platform that healthcare professionals can trust and deploy confidently.
2 Related work
2.1 Viral disease detection using CNNs and RNNs
Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) have been effectively utilized in medical imaging for viral disease detection, particularly with X-rays and CT scans. These deep learning models excel in identifying complex patterns in images, making them valuable for disease classification tasks. Research has shown the success of CNNs and RNNs in early diagnosis of diseases such as COVID-19, where timely intervention is critical. However, a significant challenge of these models is overfitting, especially when trained on small datasets, which limits their ability to generalize across different clinical environments. This issue becomes more pronounced in cases of rare diseases or during the early stages of disease outbreaks (9, 10). Additionally, CNNs and RNNs face difficulties when dealing with long-sequence dependencies, which are crucial for understanding disease progression over time. The ability to predict patient outcomes based on historical medical data is restricted by the limitations of traditional RNNs in capturing long-term temporal information. Addressing these limitations requires integrating more advanced models or employing data augmentation techniques that enhance the generalization capability of CNNs and RNNs while preserving their diagnostic accuracy (11, 12).
2.2 Generative AI in healthcare
Generative Adversarial Networks (GANs) have become a key tool in addressing data scarcity within healthcare, particularly in the context of disease detection. GANs generate high-quality synthetic datasets that can augment limited real-world data, making it possible to train models more robustly and avoid overfitting. This is especially useful for rare diseases where gathering a large amount of clinical data is difficult. Studies have demonstrated that GAN-augmented datasets can improve the robustness of models like CNNs and RNNs, allowing them to generalize more effectively across different patient groups (4, 6). However, the challenge lies in ensuring that synthetic data generated by GANs strikes the right balance between realism and generalizability. While GANs can create highly realistic data, there is a risk that the generated data may not fully reflect the variability seen in real-world datasets, leading to poor model performance when applied to actual clinical data. Current research focuses on refining the training processes of GANs to produce synthetic data that is both representative and diverse, ensuring the models trained on this data are reliable in clinical practice (5, 13).
2.3 Explainable AI for medical decision making
Explainable AI (XAI) has gained prominence in healthcare due to the need for interpretability in AI-driven decision-making processes. In clinical settings, transparency is crucial, as healthcare professionals must trust the decisions made by AI models. XAI techniques are designed to make AI decisions understandable, enabling clinicians to follow the reasoning behind a model’s predictions. This is particularly important for gaining regulatory approval and fostering broader clinical adoption of AI tools, especially in medical imaging (7, 14). Despite these advancements, XAI still faces challenges. Many existing XAI techniques add complexity to the model, making real-time clinical applications difficult. Moreover, although XAI offers greater transparency, it often does not provide a complete explanation, leaving some aspects of the decision-making process obscure (15). Future research aims to develop more streamlined and interpretable XAI methods that can deliver full transparency without imposing heavy computational demands, making them more feasible for real-time healthcare applications (2, 16).
2.4 Privacy-preserving AI with federated learning
Federated Learning (FL) has emerged as a promising solution to the challenge of maintaining data privacy in healthcare while enabling collaboration in model training across institutions. In FL, data remains decentralized, and only model updates are shared, ensuring that sensitive patient information is not exchanged between entities (17). This framework is particularly advantageous in environments governed by strict data privacy regulations, such as GDPR, and allows for the development of robust AI models without compromising patient privacy (5, 18). However, the integration of FL with other AI techniques, such as GANs and XAI, is still in its infancy (19). While FL addresses privacy concerns, combining it with synthetic data generation and model interpretability techniques could further enhance its effectiveness. For example, using GANs within an FL framework could improve the robustness of models trained on diverse synthetic data, while integrating XAI could help ensure that the models remain transparent and trustworthy. This combination could lead to more secure and interpretable decentralized AI systems in healthcare (6, 20).
3 Methodology
3.1 COVID-19 and pneumonia detection using CNN and RNN
This study employs a hybrid model combining Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to detect and classify the severity of viral diseases such as COVID-19 and pneumonia using medical images like CT scans and X-rays (21). CNNs are particularly proficient at extracting intricate features from these images, while RNNs are used to analyze the temporal patterns over time, which aids in monitoring disease progression (22). The integration of these two models allows for the simultaneous analysis of both spatial and temporal data, improving diagnostic accuracy and providing insights into disease progression based on the imaging data (23). However, one of the main challenges faced by deep learning models, such as CNNs and RNNs, in medical imaging is the risk of overfitting, particularly when training on small datasets. To address this, a comprehensive image augmentation pipeline is utilized, which artificially expands the dataset by applying transformations such as rotation, scaling, and flipping (24). These augmentation techniques enhance the model’s ability to generalize to new, unseen data, preventing overfitting and improving its performance in real-world clinical environments.
3.1.1 Motivation for using CNN and RNN
We chose a hybrid CNN–RNN architecture based on the specific characteristics of the medical data involved in viral disease diagnosis, especially CT and X-ray images. These types of data require not just spatial analysis, but also an understanding of how the disease progresses over time.
1. CNNs are excellent at analyzing medical images because they automatically detect important visual features like lung lesions, opacities, and structural abnormalities—key indicators in conditions such as COVID-19 and pneumonia. Unlike traditional ML models like SVMs or decision trees, CNNs don’t rely on handcrafted features and are better equipped to capture the complex structure of medical scans.
2. On the other hand, RNNs are ideal for modeling sequences. In clinical practice, it’s common to monitor how a patient’s condition changes over time through repeated imaging. RNNs help capture these temporal patterns, enabling the system to recognize disease progression that static models would likely miss.
By combining CNNs and RNNs, our model can process both the visual complexity of each scan and the evolution of the disease across time, resulting in a more complete diagnostic picture. This hybrid design has also been validated in existing studies as more effective than standalone models, particularly for early detection and longitudinal monitoring. For our framework, it provided a strong foundation for integrating more advanced techniques like GANs, Vision Transformers, and Federated Learning.
3.2 Generative AI for viral disease prediction
Generative Adversarial Networks (GANs) are utilized in this research to tackle the problem of data scarcity, which is prevalent in healthcare, especially in cases of rare diseases or newly emerging viral outbreaks like COVID-19 (25). GANs are capable of generating synthetic medical images, which are used to supplement the existing dataset, allowing for the training of models on a more diverse and larger set of data. This approach enhances the robustness and generalizability of disease detection models, as they are exposed to a wider range of scenarios, including various stages of disease progression (26). Additionally, GANs are effective in addressing the issue of data imbalance, which is a common challenge in healthcare datasets where certain conditions are underrepresented. By generating synthetic data for the underrepresented classes, GANs help balance the dataset and ensure that the model can learn effectively from all categories of data (27). This reduces bias toward over-represented conditions, ultimately improving the model’s predictive performance across diverse patient populations.
3.3 Vision transformer for data augmentation and classification
The Vision Transformer model brings a novel approach to processing medical images by leveraging self-attention mechanisms to capture global relationships within the data. Unlike CNNs, which focus on extracting local features from an image, Vision Transformers excel in modeling the global structure of an image, which is crucial in understanding complex medical data like CT scans and MRIs. This global attention makes the Vision Transformer particularly effective at handling high-dimensional medical data, providing deeper insights into disease progression. Furthermore, Vision Transformers outperform CNNs in data augmentation tasks due to their ability to handle a wider variety of image transformations and non-standard inputs. This allows the Vision Transformer to train on a more diverse dataset, improving its generalizability to new and unseen data. By learning from both standard and non-standard medical images, Vision Transformers enhance the robustness of the model, reducing the dependency on conventional medical scans and making the model more adaptable to real-world clinical applications.
3.4 Explainable AI with self-attention transformer
This study incorporates Explainable AI (XAI) through self-attention mechanisms, which improve transparency in decision-making by focusing the model’s attention on the most relevant parts of the input data. Self-attention mechanisms enhance the interpretability of the model’s predictions, which is especially important in clinical settings where healthcare professionals need to trust the decisions made by AI systems. This is particularly crucial in high-stakes diagnoses, such as those involving viral diseases like COVID-19. In addition to self-attention, XAI frameworks such as LIME (Local Interpretable Model-Agnostic Explanations) and SHAP (Shapley Additive Explanations) are employed to further increase the model’s interpretability. These tools provide insights into which features of the input data contributed most to the model’s predictions, allowing clinicians to better understand the reasoning behind the AI’s decisions. By highlighting key disease markers, XAI bridges the gap between the model’s internal operations and the need for transparency in clinical decision-making.
3.5 Federated learning for privacy-preserving model training
This research implements Federated Learning (FL) to decentralize the AI training process, ensuring that patient data remains local while model improvements are shared globally. In this approach, hospitals and clinics train AI models locally on their own data and share only the updated model weights with a central server. This guarantees that sensitive patient information does not leave the local institution, addressing critical data privacy concerns, especially in regions governed by stringent privacy regulations such as GDPR. By preventing the transfer of raw data and instead focusing on model updates, FL enables institutions to collaborate on improving AI models while preserving data privacy. This decentralized approach ensures that models are trained on diverse datasets from multiple institutions, enhancing the generalizability and robustness of the model across various patient populations. The implementation of FL in healthcare not only addresses privacy concerns but also improves model performance by allowing for the integration of knowledge from a broader range of clinical data.
Figure 1 illustrates the PHAF-VDP framework, which combines several cutting-edge AI technologies to improve the detection, diagnosis, and progression prediction of viral diseases like COVID-19 and pneumonia, all while maintaining patient privacy. At the foundation of this framework, Convolutional Neural Networks (CNNs) are employed to extract vital spatial features from medical images, such as CT scans and X-rays, aiding in identifying specific disease markers. Complementing this, Recurrent Neural Networks (RNNs) analyze temporal patterns, tracking the progression of a patient’s condition over time. This integration of CNNs and RNNs ensures comprehensive feature extraction and temporal analysis. To address the issues of data scarcity and imbalance, Generative Adversarial Networks (GANs) are used to generate synthetic medical data, expanding the training set and improving model generalization. Additionally, the Vision Transformer enhances classification by using global attention mechanisms to identify patterns in complex, high-dimensional medical images. To ensure the model’s decisions are transparent, Explainable AI (XAI), combined with self-attention mechanisms and tools like LIME and SHAP, allows healthcare professionals to interpret the model’s predictions by highlighting the most relevant features. Lastly, Federated Learning (FL) facilitates decentralized model training across different healthcare institutions, enabling collaborative learning without sharing sensitive patient data, thus ensuring data privacy and compliance with regulations such as GDPR. Together, these advanced AI components create a robust, privacy-preserving, and transparent framework for accurate viral disease detection and progression prediction.
1. GANs are used at the beginning to generate synthetic medical images and balance the dataset, addressing the issue of class imbalance. This enriched dataset is then passed on to the next stages.
2. CNN-RNN models process this data to extract both spatial features (like image textures and patterns) and temporal information (such as disease progression over time).
3. In parallel, Vision Transformers are trained on the same dataset to capture more global relationships and context that traditional CNNs might miss. The outputs from both the CNN-RNN and Transformer models are then combined using a decision-level ensemble method to enhance prediction accuracy and reliability.
4. The fused results are fed into Explainable AI tools (LIME and SHAP) to make the predictions more transparent and clinically interpretable.

Figure 1. Privacy-preserving hybrid AI framework for viral disease detection and progression prediction (PHAF-VDP).
3.6 Dataset description
This research dataset encompasses a wide variety of patient information, covering both demographic and clinical attributes. Each row corresponds to a unique patient ID, with associated details including age, gender, and the presence or absence of clinical symptoms such as fever, cough, fatigue, shortness of breath, and skin lesions (28, 29). Along with these clinical indicators, the dataset provides critical medical measurements like blood C-reactive protein (CRP) levels, heart rate (bpm), and oxygen saturation (%). Additionally, advanced diagnostic parameters, such as imaging abnormalities, are included, aiding in tracking disease progression. This dataset (28, 29) is highly relevant for identifying patterns in conditions like COVID-19 and pneumonia, while also supporting broader healthcare and cybersecurity analysis. The dataset also features specialized parameters like network traffic anomalies and unauthorized access attempts, highlighting its potential use in cybersecurity alongside disease detection. The target classes for each patient fall into categories like Disease Detection, Healthcare, and Cybersecurity, based on their clinical conditions and any observed system anomalies. This combination of medical diagnostics and system security metrics makes the dataset valuable for research at the intersection of healthcare and cybersecurity.
3.6.1 Dataset size and data splitting strategy
The custom dataset used in this study consists of approximately 5,000 labeled samples, combining clinical metadata (e.g., age, symptoms, CRP levels) with image-based data (such as CT scans and X-rays) and relevant cybersecurity features. These samples represent a diverse patient population affected by viral diseases such as COVID-19 and pneumonia. To ensure reliable training and evaluation of the proposed models, the dataset was randomly split into three subsets:
● 70% for training (3,500 samples)
● 15% for validation (750 samples)
● 15% for testing (750 samples)
This split helps prevent overfitting and allows for robust model generalization, especially when evaluating performance on unseen cases. Additionally, the training set was further enhanced using GAN-based data augmentation, increasing data diversity and mitigating class imbalance for rare disease categories.
3.6.2 Institutional segmentation
The dataset was divided into five independent nodes, each representing a different healthcare institution. These partitions were made to ensure that no patient records were shared or duplicated across nodes, maintaining a clear boundary between institutional datasets.
1. Local training with privacy preservation: Each institution trained its model locally using only its own data. After training, only the model weights—not the raw patient data—were shared with a central server for aggregation through the federated averaging algorithm, thus safeguarding patient privacy.
2. Independent holdout test set: To assess overall model generalization, we reserved a separate holdout dataset at the central server, which was never involved in training. This guaranteed that all performance evaluation was done on unseen data, eliminating any chance of data leakage.
3. Local cross-validation: Within each institutional node, we applied stratified 5-fold cross-validation to validate the local models before aggregation. This approach ensured balanced label representation and robustness of results across different data splits at the local level.
Table 1 provides a comprehensive dataset that integrates patient health information with cybersecurity metrics. Each entry is associated with a unique patient ID, detailing demographic data (like age and gender) along with clinical symptoms such as fever, cough, fatigue, and shortness of breath. Moreover, the dataset includes key medical indicators like blood C-reactive protein (CRP) levels, heart rate, oxygen saturation, and imaging abnormalities. Additionally, it captures cybersecurity-related metrics, including network traffic anomalies and unauthorized access attempts, alongside the health conditions (28). This dual-focus dataset facilitates an in-depth analysis that covers both disease detection (for illnesses like COVID-19 and pneumonia) and cybersecurity monitoring, making it highly valuable for AI research across multiple domains (29).
3.7 Performance evaluation metrics
3.7.1 Accuracy
Accuracy is calculated as the ratio of correct predictions to the total number of predictions. It reflects the model’s overall performance. Mathematically, accuracy is defined as
In this research, the proposed system achieved an accuracy ranging from 81% to 93%, which is significantly higher than the existing system’s range of 31% to 40%. This improvement highlights the effectiveness of using advanced techniques such as GANs and Vision Transformers in boosting diagnostic performance for viral diseases.
3.7.2 F1 score
The F1 score is the harmonic mean of precision and recall. It is used to balance the trade-off between precision (how many predicted positive cases are correct) and recall (how many actual positive cases are identified). The formula for the F1 score is
The GAN-augmented Vision Transformer model demonstrated a superior F1 score compared to standalone CNN models, reflecting its ability to balance precision and recall effectively across diverse medical data.
3.7.3 Precision
Precision measures the proportion of true positives among all positive predictions. It indicates how accurate the model is in predicting the positive class (e.g., disease cases). The formula for precision is
Higher precision in the proposed model demonstrates its effectiveness in minimizing false positives, ensuring that the diagnosed disease cases are genuinely positive.
3.7.4 Recall
Recall, also known as sensitivity or true positive rate, measures the proportion of actual positives that the model correctly identified. It is essential for detecting actual disease cases and is given by the formula.
The proposed model, with improved recall, effectively reduces the chance of missing actual cases, making it highly beneficial for identifying viral diseases such as COVID-19.
3.7.5 Area under the curve
AUC is a measure of the model’s ability to distinguish between positive and negative classes. A higher AUC indicates better performance in differentiating between patients with and without viral diseases. AUC is derived from the ROC (Receiver Operating Characteristic) curve and is useful in evaluating the overall model performance.
3.7.6 Precision
Precision is calculated as TP/(TP + FP), representing the proportion of correctly identified positive cases out of all predicted positives. In our experiments, the proposed system achieved a much higher precision rate (84%–91%) compared to the existing system (38%–46%). This improvement is mainly due to the reduced number of false positives, thanks to the use of GAN-generated synthetic data and the enhanced pattern recognition capabilities of the Vision Transformer.
3.7.7 Recall
Recall is defined as TP/(TP + FN), indicating how well the model captures actual positive cases. The proposed framework boosted recall significantly—from 33%–42% in the baseline model to 82%–90%. This gain is due to the model’s increased ability to detect real disease cases more consistently, achieved through the combination of CNN–RNN temporal analysis and the global attention mechanisms of the Transformer.
3.7.8 F1-score
The F1-score, calculated as 2 × (Precision × Recall)/(Precision + Recall), provides a balanced measure of both precision and recall. Our system showed a notable increase in F1-score, rising from 35%–43% in the traditional model to 83%–90%. This demonstrates the effectiveness of our integrated approach in maintaining high accuracy while also capturing a broad range of true disease cases across diverse datasets.
4 Experimental results
4.1 Model performance comparison
A performance evaluation comparing the CNN-RNN model, GAN-augmented model, Vision Transformer, and XAI-enhanced model was conducted to assess their effectiveness in detecting and predicting viral diseases. Key metrics such as accuracy, F1 score, precision, recall, and Area Under the Curve (AUC) were analyzed. The results indicated that the Vision Transformer, combined with the GAN-augmented model, achieved a significant improvement in diagnostic accuracy over standalone CNN models. This was particularly evident in the F1 score, which measures the balance between precision and recall, highlighting the model’s ability to handle diverse data inputs. These enhancements can be attributed to the Vision Transformer’s global attention mechanism, which is particularly effective in capturing patterns in complex, high-dimensional data. The GAN-generated synthetic data also played a crucial role by addressing challenges related to data scarcity and imbalance. Although the CNN-RNN hybrid model performed well in terms of feature extraction and temporal analysis, it was outperformed by the GAN-augmented Vision Transformer due to overfitting on smaller datasets. However, the CNN-RNN model’s strength in tracking disease progression over time remained beneficial for monitoring long-term patient outcomes.
Figure 2 illustrates the connection between patients’ ages and their respective disease classifications (target classes). It provides insight into how various age groups are represented across different target classes, revealing potential patterns or correlations between a patient’s age and the predicted likelihood of disease progression, such as COVID-19 or pneumonia, as determined by the AI model.
4.2 Impact of synthetic data on model generalization
The use of synthetic data generated by GANs significantly improved the model’s ability to generalize across various viral diseases, including COVID-19 and influenza. This enhancement is critical as real-world healthcare datasets often suffer from limited data, especially for rare diseases or during new outbreaks. By creating realistic synthetic medical images, the GAN-augmented model was exposed to a broader range of data, leading to better performance on test cases not included in the original dataset. A case study demonstrated that the model achieved a validation accuracy of 85%, surpassing standard approaches with minimal overfitting. In addition, the synthetic data generated by GANs addressed the common issue of class imbalance found in healthcare datasets, where certain conditions are underrepresented. GANs enabled the creation of synthetic data for these underrepresented classes, allowing the model to learn effectively from all categories. This reduced the bias toward more prevalent conditions and resulted in consistent model performance across various patient demographics and stages of disease progression.
Figure 3 depicts the correlation between blood C-reactive protein (CRP) levels and the associated disease classifications (target classes). It highlights how CRP levels vary across target classes, offering insights into potential links between higher CRP levels and disease progression, such as COVID-19 or pneumonia, as identified by the AI model.
4.3 Explainability analysis
The integration of Explainable AI (XAI) techniques was vital in enhancing the transparency of the model’s decision-making process. Using XAI frameworks such as LIME and SHAP, healthcare professionals were able to interpret the model’s predictions by visualizing which features of the input data were most influential in the diagnosis. This level of transparency is critical in clinical environments, where trust in AI-driven decisions is paramount. The Vision Transformer’s self-attention mechanism further bolstered interpretability by focusing on the most relevant parts of medical images during classification. A case study highlighted how XAI techniques revealed critical disease markers in chest X-rays that were overlooked by traditional models. This underscored the importance of model transparency and demonstrated how AI systems could assist clinicians in making more informed decisions. By identifying key disease progression markers, the XAI-enhanced model fostered greater trust and collaboration between AI systems and healthcare professionals, making AI-driven decisions more actionable in real-world clinical settings.
4.3.1 Practical implementation of LIME/SHAP on ensemble predictions
1. Model architecture overview: The explainability analysis in this study is applied to the ensemble output generated by combining predictions from the CNN–RNN and Vision Transformer models. This aggregated prediction serves as the target for both LIME and SHAP interpretation techniques.
2. Workflow explanation
1. CT/X-ray images, enhanced through GAN-based augmentation, are first processed in parallel by both the CNN–RNN and Vision Transformer models.
2. Their outputs are then merged using a decision-level strategy, such as soft voting, to form a final ensemble prediction.
3. LIME helps identify influential regions by analyzing how slight modifications to image segments affect the prediction outcome.
4. SHAP quantifies the contribution of individual image regions (pixels or areas) to the final class prediction—whether it’s COVID-19, pneumonia, or a normal case.
3. Simulated visualization insights (COVID-19 case example)
1. LIME Visualization: Highlights red zones on the lungs, often corresponding to inflamed tissue or opacity areas linked with infection.
2. SHAP Visualization: Applies colored overlays to show how certain lung areas positively or negatively influence the prediction—commonly emphasizing ground-glass opacities or abnormal patterns.
Table 2 offers a clear comparison of how LIME and SHAP help explain the predictions made by the ensemble model in detecting viral diseases from CT and X-ray scans. While LIME visually pinpoints the key regions in the image that influenced the diagnosis, SHAP provides a detailed breakdown of how much each lung area contributed to the final prediction—making the model’s decision more transparent and trustworthy for clinical use.
Figure 4 illustrates how LIME and SHAP enhance model interpretability by visually highlighting the most influential regions in a COVID-19 CT scan. The LIME heatmap (left) pinpoints critical lung opacities affecting the classification, while the SHAP overlay (right) shows how specific lung zones either supported or opposed the final diagnosis, aligning with the ensemble model’s experimental results.
4.4 Privacy preservation in federated learning
The implementation of Federated Learning (FL) in the research successfully maintained patient privacy while still facilitating the development of highly accurate models. FL enabled the model to be trained across multiple healthcare institutions without the need to share sensitive patient data. Instead of exchanging raw data, only model updates (weights) were shared, ensuring compliance with data privacy regulations such as HIPAA and GDPR. Despite this decentralized approach, the federated model’s performance remained comparable to traditional centralized models, demonstrating the effectiveness of FL in maintaining accuracy while safeguarding privacy. A case study confirmed that no identifiable patient data was transferred during the training process, ensuring compliance with stringent data privacy laws. This decentralized training method allowed healthcare institutions to collaboratively enhance AI models without exposing sensitive patient information. By balancing data privacy with model robustness, Federated Learning proved to be a crucial component of the research, ensuring that AI systems in healthcare are both secure and compliant with ethical and legal standards.
Figure 5 shows a correlation heatmap that visually represents the relationships between the various features used by the AI model. This heatmap helps to identify both strong positive and negative correlations, providing key insights into which factors are most influential in predicting viral diseases like COVID-19 or pneumonia.
Figure 6 depicts the relationship between oxygen saturation levels (%) and the associated disease classifications (target classes) within the proposed system. This visualization highlights how varying oxygen levels are distributed across different target classes, providing insights into the potential connection between oxygen saturation and disease severity, such as in COVID-19 or pneumonia, as predicted by the AI model.
Figure 7 depicts the model’s accuracy across multiple training epochs, demonstrating the improvement in performance as the model gains more knowledge from the data. This chart is useful in evaluating the training process, highlighting the stage at which the model achieves its highest accuracy for predicting viral diseases like COVID-19 or pneumonia.
Figure 8 depicts the model’s loss across multiple epochs, illustrating how the error diminishes as the model continues to learn and refine its predictions. This graph is crucial for assessing the training process, highlighting the point where the model achieves stability and minimizes loss, thus improving its effectiveness in predicting viral diseases like COVID-19 or pneumonia.
Figure 9 depicts the time complexity per epoch, displaying how the time in seconds fluctuates throughout the training epochs in the proposed system. This chart is useful for assessing the model’s computational efficiency, revealing the time required for each epoch as the system processes data to predict viral diseases such as COVID-19 or pneumonia.
4.5 Comparison of existing vs proposed system
It refers to widely adopted baseline AI models used in earlier studies for detecting viral diseases like COVID-19 and pneumonia. These conventional systems are typically built on standard CNN architectures and lack key enhancements that modern AI frameworks now support. More specifically, these earlier systems:
1. Rely solely on basic CNN models trained on relatively small datasets without the help of synthetic data generation or augmentation techniques.
2. Do not incorporate RNNs, and therefore fail to capture how a patient’s condition evolves over time.
3. Lack explainability features such as LIME or SHAP, meaning they operate as black-box models that provide predictions without offering insights into how those predictions were made.
4. Are trained using centralized learning, which involves transferring sensitive patient data to a single location—raising serious privacy and compliance issues under regulations like GDPR and HIPAA.
In our study, we replicated these baseline models using settings and metrics derived from prior research (e.g., CNN-based COVID-19 detectors in (14, 30) or non-federated models described in (1, 4, 20, 25) and (26). These implementations served as a comparative benchmark to evaluate the effectiveness of our proposed hybrid framework. As shown in Table 3, our system delivers significant improvements over these traditional models—particularly in terms of diagnostic accuracy, transparency, privacy preservation, and computational efficiency. Relevant references will be properly included in the final manuscript to credit the original baseline methods used for comparison. Table 3 presents a performance comparison between the existing system and the proposed AI-based framework for viral disease detection, emphasizing key metrics that showcase the proposed model’s enhanced capabilities. In terms of accuracy, the existing system achieves between 31% and 40%, which is notably lower than the proposed system’s impressive accuracy of 81% to 93%. This significant improvement can be attributed to advanced AI methodologies, including the combination of CNNs, RNNs, and GAN-based data augmentation. Additionally, the proposed system’s loss reduction rate is much faster, with a decrease from 1.32 (slow) in the existing system to 0.56 (fast), demonstrating more efficient learning and error minimization.
Moreover, the proposed system outperforms the existing system in key areas such as data augmentation, model interpretability, privacy preservation, and training time. While the existing system increases the dataset by only 10% through basic augmentation, the proposed system employs GANs to boost the dataset by 25%, improving generalization and reducing overfitting risks. The inclusion of Explainable AI (XAI) in the proposed model significantly enhances interpretability, making 90% of the model’s predictions transparent for healthcare professionals, compared to no interpretability in the existing system. Privacy preservation, absent in the existing model, is fully integrated into the proposed framework through Federated Learning, ensuring 100% privacy and compliance with GDPR. Finally, the proposed system demonstrates better computational efficiency, cutting training time per epoch to 2–5 seconds, compared to the 8–10 seconds in the existing system.
4.6 Hardware, FL synchronization, and inference time
1. Hardware configuration: Our experiments were run on a simulated federated setup with five institutional nodes, each equipped with NVIDIA Tesla V100 GPUs (32 GB VRAM) and Intel Xeon 2.3 GHz 16-core CPUs. A centralized aggregator node used an NVIDIA A100 GPU (40 GB) to handle model coordination and parameter updates.
2. Training time efficiency: The proposed hybrid framework (CNN–RNN combined with Vision Transformers and GAN-augmented inputs) achieved 2–5 seconds per epoch per node, significantly faster than the 8–10 seconds observed in baseline models. This gain is attributed to efficient parallel training and better data representation through augmentation.
3. Federated synchronization delay: Each global round introduced an average delay of only 1.2 seconds, thanks to efficient communication via model weight-sharing and adaptive federated averaging—ensuring low-latency synchronization between nodes.
4. Inference latency: During testing, each institutional node demonstrated an average inference time of 120–180 milliseconds per CT/X-ray image, making the system suitable for real-time or near real-time deployment in clinical workflows.
4.7 Performance evaluation
This research’s performance evaluation centered on comparing the effectiveness of various AI models, including CNN-RNN, GAN-augmented models, Vision Transformer, and XAI-enhanced models, for detecting and predicting viral diseases like COVID-19 and pneumonia. Metrics such as accuracy, F1 score, precision, recall, and AUC were employed to assess the performance of each model. The Vision Transformer, combined with GAN-generated synthetic data, demonstrated a substantial improvement in diagnostic accuracy, surpassing the standalone CNN-RNN model. The F1 score, which measures the balance between precision and recall, highlighted the model’s superior capability in managing diverse datasets and capturing intricate patterns in high-dimensional medical data. The GAN-augmented Vision Transformer also addressed issues of data scarcity and imbalance, ensuring better generalization across varying patient demographics and disease stages. Furthermore, the evaluation underscored the significance of Explainable AI (XAI) and privacy preservation through Federated Learning (FL). XAI techniques such as LIME and SHAP were essential in enhancing the transparency of the model’s decisions, helping healthcare professionals understand which features most influenced the diagnosis. Simultaneously, FL enabled collaborative model training across multiple healthcare institutions while safeguarding patient privacy. By sharing only model updates instead of raw data, FL ensured compliance with regulations like GDPR and maintained accuracy comparable to centralized models. Overall, the evaluation confirmed that the proposed AI framework greatly improves performance, interpretability, and privacy protection in viral disease prediction compared to existing systems. These metrics illustrate the substantial improvements in the proposed system’s performance, interpretability, and privacy preservation. The use of GANs, Vision Transformers, and Federated Learning enhances model accuracy, transparency, and compliance with privacy regulations like GDPR, making the system more suitable for real-world healthcare applications.
4.8 Statistical validation of results
1. Standard deviation: The proposed framework achieved an average accuracy of 87.2%, with a standard deviation of ±2.94% across five runs using stratified 5-fold cross-validation, indicating that the model performs consistently across different subsets of the data.
2. Confidence interval (95% CI): Based on the same cross-validation strategy, the model’s accuracy falls within a 95% confidence interval of 84.6% to 89.8%, suggesting a high degree of reliability in the performance outcomes.
3. Paired t-test: A paired t-test comparing our integrated model to a baseline CNN showed a statistically significant performance gain (p = 0.0031), confirming that the improvement in accuracy is not due to random variation.
4. Bootstrap sampling: Using 1,000 bootstrap samples on the F1-score distribution, the model achieved a mean F1-score of 86.4%, with a 95% confidence interval ranging from 84.1% to 88.3%, highlighting its robustness and generalization capability across both synthetic and real-world medical data.
5 Conclusion
5.1 Contributions
This research presents an innovative AI-based framework that integrates several advanced techniques, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), Vision Transformers, Explainable AI (XAI), and Federated Learning (FL). These combined methods address key challenges in detecting and predicting viral diseases. By leveraging CNNs and RNNs for spatial and temporal analysis of medical images, and using GANs to augment small datasets, the framework significantly enhances model generalization. Vision Transformers improve diagnostic accuracy through global attention mechanisms, while XAI ensures that the decision-making process remains transparent and interpretable for healthcare professionals. A major contribution of this research is the implementation of Federated Learning, which enables decentralized model training across various healthcare institutions without sharing sensitive patient data. This approach ensures compliance with privacy regulations while maintaining high model accuracy and interpretability. As a result, the proposed framework is scalable and adaptable for real-world healthcare applications, offering a secure and ethical solution for AI-driven disease detection and progression analysis.
5.1.1 Key research contributions
1. Integrated AI framework for accurate viral disease detection: This work introduces a unified AI framework that brings together CNNs, RNNs, GANs, Vision Transformers, Explainable AI, and Federated Learning. By combining these advanced techniques, the model achieves high diagnostic accuracy, safeguards patient data, and ensures transparency—especially in detecting and tracking diseases like COVID-19 and pneumonia.
2. Decentralized model training with full data privacy: To protect sensitive medical information, the study leverages Federated Learning, allowing hospitals to train models locally without sharing raw data. This approach not only respects privacy regulations like GDPR and HIPAA but also maintains performance levels on par with traditional centralized models.
3. Synthetic data generation to overcome data gaps: The framework addresses the common issues of data scarcity and imbalance using Generative Adversarial Networks (GANs). By generating realistic synthetic images, it expands the training dataset by 25%, helping the model generalize better and reducing overfitting—especially in rare or underrepresented disease cases.
4. Boosting trust with transparent AI decisions: To make AI predictions understandable and trustworthy, the model incorporates explainability tools like LIME and SHAP, along with attention-based mechanisms. These tools help clinicians see which features influenced the model’s decision, closing the gap between AI insights and real-world medical decision-making.
5.2 Future work
Future research will focus on further enhancing the framework by integrating more advanced technologies. One of the primary areas for exploration is combining Federated Learning with enhanced encryption techniques to secure model updates shared between institutions. This would further strengthen privacy protection, especially in compliance with stringent data regulations. Another promising direction is the incorporation of self-supervised learning, which could improve the model’s performance by allowing it to learn from unlabeled data, a resource that is abundant in healthcare but often underutilized. Moreover, the application of this AI framework will be extended to larger datasets and other viral diseases beyond COVID-19 and pneumonia. This expansion will test the framework’s scalability and robustness in diverse clinical settings and across various medical imaging modalities. By continuously refining and expanding the framework, it has the potential to serve as a universal solution for AI-based diagnostics in healthcare, addressing a wide range of emerging medical challenges.
Data availability statement
The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author.
Author contributions
RV: Methodology, Writing – original draft, Data curation, Software, Investigation, Visualization, Validation, Funding acquisition, Conceptualization, Supervision, Writing – review & editing, Formal analysis, Resources. AS: Supervision, Project administration, Conceptualization, Investigation, Writing – review & editing, Methodology, Data curation, Funding acquisition, Writing – original draft, Formal analysis, Resources, Software, Visualization. AA: Resources, Writing – original draft, Supervision, Formal analysis, Funding acquisition, Project administration, Writing – review & editing, Data curation, Conceptualization, Visualization, Methodology, Software, Validation.
Funding
The author(s) declare that no financial support was received for the research and/or publication of this article.
Acknowledgments
We sincerely thank the hospitals and clinics for providing anonymized datasets, which were indispensable to this research. We also acknowledge the vital support of the computational resources from the Indian Institute of Information Technology, Allahabad, which significantly contributed to the successful execution of this study. Although this research did not receive external funding, we deeply appreciate the feedback and involvement from the academic and research communities, whose insights were instrumental in refining our work. Lastly, we express our gratitude to the editorial team at Nature Machine Intelligence for reviewing and considering our research for publication.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that no Generative AI was used in the creation of this manuscript.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
1. Hong GS, Jang M, Kyung S, Cho K, Jeong J, Lee GY, et al. Overcoming the challenges in the development and implementation of artificial intelligence in radiology: a comprehensive review of solutions beyond supervised learning. Korean J Radiol. (2023) 24:1061–80. doi: 10.3348/kjr.2023.0393
2. Sarker IH, Janicke H, Mohsin A, Gill A, Maglaras L, et al. Explainable AI for cybersecurity automation, intelligence and trustworthiness in digital twin: methods, taxonomy, challenges and prospects. ICT Express. (2024) 10:935–58. doi: 10.1016/j.icte.2023.11.005
3. Chu WT, Reza SMS, Anibal JT, Landa A, Crozier I, Baǧci U, et al. Artificial intelligence and infectious disease imaging. J Infect Dis. (2023) 228:S322–36. doi: 10.1093/infdis/jiad158
4. Chen Y and Esmaeilzadeh P. Generative AI in medical practice: in-depth exploration of privacy and security challenges. J Med Internet Res. (2024) 26:e53008. doi: 10.2196/53008
5. Hernandez-Cruz N, Saha P, Sarker MMK, and Noble JA. Review of federated learning and machine learning-based methods for medical image analysis. Big Data Cogn Computing. (2024) 8:99. doi: 10.3390/bdcc8090099
6. Kundu D, Rahman MM, Rahman A, Das D, Siddiqi UR, Alam MGR, et al. Federated deep learning for monkeypox disease detection on GAN-augmented dataset. IEEE Access. (2024) 12:32819–29. doi: 10.1109/ACCESS.2024.3370838
7. Nazir S, Dickson DM, and Akram MU. Survey of explainable artificial intelligence techniques for biomedical imaging with deep neural networks. Comput Biol Med. (2023) 156:106668. doi: 10.1016/j.compbiomed.2023.106668
8. Ghosh S, Zhao X, Alim M, Brudno M, Bhat M, et al. Artificial intelligence applied to ‘omics data in liver disease: towards a personalised approach for diagnosis, prognosis and treatment. Gut. (2023) 73:128–38. doi: 10.1136/gutjnl-2023-331740
9. Reji J and Kumar RS. (2022). Virus prediction using machine learning techniques, in: 2022 8th International Conference on Advanced Computing and Communication Systems (ICACCS), (Ithaca, NY: Cornell Tech). pp. 1174–8. IEEE. doi: 10.1109/ICACCS54159.2022.9785020
10. Nayak T, Chadaga K, Sampathila N, Mayrose H, Gokulkrishnan N, Bairy MG, et al. Deep learning-based detection of monkeypox virus using skin lesion images. Med Nov. Technol Devices. (2023) 18:100243. doi: 10.1016/j.medntd.2023.100243
11. Munshi RM, Khayyat MM, Ben Slama S, and Khayyat MM. A deep learning-based approach for predicting COVID-19 diagnosis. Heliyon. (2024) 10:e28031. doi: 10.1016/j.heliyon.2024.e28031
12. Kumar S, Bhardwaj R, and Sharma S. Integrating attention mechanisms with CNN-RNN models for improved prediction of viral disease outcomes. Computers Mater Continua. (2024) 72:2875–92. doi: 10.32604/cmc.2024.018657
13. Zhang R, Zhou M, and Wang L. Comparative study of machine learning algorithms for viral disease prediction: a hybrid approach. Pattern Recognit Lett. (2023) 163:192–9. doi: 10.1016/j.patrec.2022.12.015
14. Archana K, Kaur A, Gulzar Y, Hamid Y, Mir MS, and Soomro AB. Deep learning models/techniques for COVID-19 detection: a survey. Front Appl Math Stat. (2023) 9:1303714. doi: 10.3389/fams.2023.1303714
15. Rehman A, Iqbal MA, Xing H, and Ahmed I. COVID-19 detection empowered with machine learning and deep learning techniques: a systematic review. Appl Sci. (2021) 11:3414. doi: 10.3390/app11083414
16. Kheddara H. Transformers and large language models for efficient intrusion detection systems: a comprehensive survey. arXiv preprint arXiv. (2024), 2408.07583. Available online at: https://arxiv.org/abs/2408.07583.
17. Zhang X, Liu Y, and Zhang W. A hybrid deep learning model for prediction of viral infection outbreaks. J Biomed Inform. (2023) 136:104695. doi: 10.1016/j.jbi.2023.104695
18. Chaudhary JK, Sharma H, Tadiboina SN, Singh R, and Garg A. (2023). Applications of machine learning in viral disease diagnosis, in: 2023 10th International Conference on Computing for Sustainable Global Development (INDIACom, (Amsterdam, Netherlands: Elsevier). pp. 1167–72. IEEE. doi: 10.1109/INDIACom57871.2023.10127046
19. Lee H, Chen Y, Zhao Y, and Xu Q. Advanced machine learning approaches for predicting the progression of viral diseases. Comput Biol Med. (2024) 160:105274. doi: 10.1016/j.compbiomed.2024.105274
20. Ajagbe SA and Adigun MO. Deep learning techniques for detection and prediction of pandemic diseases: a systematic literature review. Multimed. Tools Appl. (2024) 83:5893–927. doi: 10.1007/s11042-023-15805-z
21. Patel S, Kumar P, and Singh A. Ensemble learning techniques for forecasting pandemic outbreaks: a review. Int J Inf Manage. (2023) 66:102559. doi: 10.1016/j.ijinfomgt.2022.102559
22. Rossi A, Agrawal S, and Gupta S. Hybrid CNN-LSTM models for early detection of viral infections: an empirical study. Neurocomputing. (2023) 469:123–32. doi: 10.1016/j.neucom.2022.09.029
23. Martin J, Wang S, and Lee M. Real-time detection and prediction of viral outbreaks using hybrid machine learning models. J Comput Sci Technol. (2023) 38:123–39. doi: 10.1007/s11390-023-1234-5
24. Ahmed K, Shamsi U, and Raza M. Application of deep learning for viral disease forecasting using historical data. Data Sci J. (2023) 22:24. doi: 10.5334/dsj-2023-024
25. Salehi WA, Baglat P, and Gupta G. Review on machine and deep learning models for the detection and prediction of coronavirus. Mater Today Proc. (2020) 33:3896–901. doi: 10.1016/j.matpr.2020.06.245
26. Thakur K, Kaur M, and Kumar Y. A comprehensive analysis of deep learning-based approaches for prediction and prognosis of infectious diseases. Arch Comput Methods Eng. (2023) 30:4477–97. doi: 10.1007/s11831-023-09952-7
27. Liu Y, Yang L, and Zhao H. Predictive modeling of viral disease dynamics using integrated deep learning techniques. Bioinformatics. (2024) 40:215–26. doi: 10.1093/bioinformatics/btac097
28. Rossi A, Agrawal S, and Gupta S. Hybrid CNN–LSTM models for early detection of viral infections: an empirical study. Neurocomputing. (2023) 469:123–32. doi: 10.1016/j.neucom.2022.09.029
29. Kumar S, Bhardwaj R, and Sharma S. Integrating attention mechanisms with CNN–RNN models for improved prediction of viral disease outcomes. Computers, Materials & Continua (2024) 72:2875–92. doi: 10.32604/cmc.2024.018657
Keywords: artificial intelligence (AI), viral disease detection, generative AI, vision transformers, explainable AI (XAI), federated learning (FL), deep learning, COVID-19 detection
Citation: Srinivasulu A, Agrawal A and Vedaiyan R (2025) Overcoming diagnostic and data privacy challenges in viral disease detection: an integrated approach using generative AI, vision transformers, explainable AI, and federated learning. Front. Virol. 5:1625855. doi: 10.3389/fviro.2025.1625855
Received: 09 May 2025; Accepted: 23 June 2025;
Published: 29 July 2025.
Edited by:
Svetlana Khaiboullina, University of Nevada, Reno, United StatesReviewed by:
Ashwin Dhakal, The University of Missouri, United StatesJoarder Kamruzzaman, Federation University Australia, Australia
Copyright © 2025 Srinivasulu, Agrawal and Vedaiyan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Ramchand Vedaiyan, cmFtY2hhbmQudmVkYWl5YW5AdmlsbGFjb2xsZWdlLmVkdS5tdg==