Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Big Data, 22 January 2026

Sec. Cybersecurity and Privacy

Volume 8 - 2025 | https://doi.org/10.3389/fdata.2025.1659026

EnDuSecFed: an ensemble approach for privacy preserving Federated Learning with dual-security framework for sustainable healthcare


Bela Shrimali
Bela Shrimali1*Jenil GajjarJenil Gajjar2Swapnoneel Roy
Swapnoneel Roy3*Sanjay PatelSanjay Patel2Kanu PatelKanu Patel2Ramesh Ram NaikRamesh Ram Naik2
  • 1Unitedworld Institute of Technology, Karnavati University, Gandhinagar, Gujarat, India
  • 2Department of Computer Science and Engineering, Institute of Technology, Nirma University, Ahmedabad, Gujarat, India
  • 3School of Computing, University of North Florida, Jacksonville, FL, United States

Recent advances in Artificial Intelligence have highlighted the role of Machine Learning in healthcare decision-making, but centralized data collection raises significant privacy risks. Federated Learning addresses this by enabling collaborative training across multiple clients without sharing raw data. However, Federated Learning remains vulnerable to security threats that can compromise model reliability. This paper proposes a dual-security Federated Learning framework that integrates Fernet Symmetric Encryption for secure transmission of model updates using symmetric encryption and an Intrusion Detection System to detect anomalous client behavior. Experiments on a publicly available healthcare dataset show that the proposed system enhances privacy and robustness compared to traditional FL. Among tested models, including Logistic Regression, Random Forest, and SVC, the ensemble method achieved the best performance with 99% accuracy.

1 Introduction

According to the Gartner report-2025 (Gartner, 2025), about 27% of organizations have faced a privacy breach or security issue related to Artificial Intelligence (AI). This means that there were intentional attacks on the organization's AI systems because they collect and process data in a central place. Federated Learning (FL) has emerged as a robust method for training machine learning models across multiple clients while maintaining the privacy of their local data. Unlike traditional methods where data is collected in one location, FL allows each client to have control of its data (Wang et al., 2023b). This is particularly useful in sensitive areas like healthcare, where patient information must be kept confidential (Khatun et al., 2023; Almalawi et al., 2023; Naresh and Thamarai, 2023). By using FL, healthcare providers can create better models by combining knowledge from different datasets without risking the privacy and security of individual patient data (Chaddad et al., 2023; Joshi et al., 2022; Kumar and Singla, 2021).

Despite its advantages, FL presents several challenges. Its decentralized architecture can introduce security vulnerabilities, particularly in securing the updates exchanged between clients and the central server. In a standard FL framework, the local model weights from each client are aggregated to form a global model. This aggregation process, however, is susceptible to security threats (Li et al., 2023; Coelho et al., 2023; Ali et al., 2024), such as data poisoning and adversarial attacks, which can compromise the performance of the global model. Such concerns are especially critical in healthcare applications, where prediction accuracy directly impacts patient safety.

To manage these risks, encrypted communication using the Fernet Symmetric Encryption (FSE) technique is implemented during the sharing of model updates between local clients and the global server. FSE allows secure calculations on encrypted data, ensuring that the model updates shared remain private. With FSE, the system protects sensitive information from attackers while still allowing clients to work together. This means even if a malicious client tries to change its model updates, the encryption will stop it from damaging the global model. While FSE secures model updates during sharing, it does not automatically detect malicious behavior or unusual activity in the Federated Learning system. Attackers can still send harmful updates that may compromise the global server. To address this, an Intrusion Detection System (IDS) is deployed at the global server to monitor and analyze incoming model updates for suspicious activity. By identifying abnormal patterns, the IDS can detect attacks such as model poisoning. This combined approach—using FSE for secure sharing and IDS for anomaly detection—enhances the overall security and trustworthiness of the FL process.

1.1 Motivation

Preserving the privacy of sensitive information is critical in healthcare, and FL has emerged as a promising paradigm as it enables collaborative model training without sharing raw data. Nevertheless, FL remains vulnerable to security threats, where malicious clients may submit harmful updates that compromise the global model's accuracy. This study aims to strengthen FL security in healthcare, where reliability is crucial for patient care. To address these challenges, Fernet Symmetric Encryption (FSE) is employed to safeguard model updates against tampering, while an IDS at the central server detects anomalous client behavior. The main contributions of this research are:

• We propose a federated learning method with dual security. A communication between local clients and the main server is secured using FSE and protects data changes at the central server with an IDS. Our method is shown to be better in security analysis compared to existing methods.

• To improve decision-making and predictions, along with existing models, an ensemble approach is also implemented that combines predictions from three main models: Logistic Regression, Support Vector Classifier, and Random Forest at the local node for training.

• We also discuss various attacks on privacy in FL models and highlight how our dual security approach adds value to this research area.

1.2 Organization

The remainder of this paper is structured as follows. Section 2 presents a comprehensive review of the existing literature. Section 3 details the proposed system architecture, including methodology, system components, and their interactions. Section 4 describes the experimental setup with the description of dataset, models and proposed algorithms. Section 5 provides an in-depth security analysis of the FSE scheme and IDS components, examining potential vulnerabilities and their mitigations. Section 6 presents experimental results, including performance metrics, comparative analysis, and validation of the approach. Lastly, Section 7 conclude with the key findings, discusses the implications of the work, and outlines promising directions for future research in this domain.

2 Literature review

FL has emerged as a promising privacy-preserving paradigm, particularly in sensitive domains such as healthcare. Unlike centralized machine learning, FL enables distributed model training without directly sharing raw data, thus safeguarding patient privacy. However, despite its advantages, FL remains vulnerable to adversarial threats, including data poisoning, label-flipping, and model poisoning attacks, where malicious clients can manipulate updates to reduce the performance of the global model (Hiwale et al., 2023). To address these vulnerabilities, researchers have explored various privacy-enhancing and security-aware strategies, which can be broadly categorized into: privacy-preserving approaches, cryptographic frameworks, IDS integration, and blockchain-enabled solutions. Privacy-preserving and cryptographic approaches.

Alazab et al. (2023) investigated FL for privacy-preserving Intrusion Detection Systems, comparing its performance against traditional deep learning models. By using the FedAvg algorithm, autoencoder-based anomaly detection, and secure gRPC channels, they reported high accuracy (98.07%), precision (97.4%), recall (99.06%), and F1-score (98.21%). Similarly, Wang et al. (2023a) introduced PPFLHE, a framework that leverages homomorphic encryption to address privacy and communication overhead in healthcare FL. Their system achieved 81.53% accuracy, showing that encryption can secure model updates but may also introduce computational overhead.

To mitigate adversarial threats, Almalki et al. (2024) proposed a hybrid Healthcare 5.0 framework that combines FL, IDS, and Blockchain Technology (BCT). Their solution improved diagnostic accuracy (93.89%) while enhancing data protection in Internet of Medical Things (IoMT) applications. Schneble (2018) explored FL-based distributed IDS for Medical Cyber-Physical Systems (MCPS), focusing on detecting cyberattacks while maintaining high accuracy and low false-positive rates. Guduri et al. (2023) further advanced security in FL by integrating blockchain with lightweight encryption and proxy re-encryption to secure Electronic Health Records (EHR). Their Ethereum-based testbed demonstrated superior resistance to unauthorized access compared with existing models.

While this literature demonstrates significant progress, several gaps remain. Privacy-preserving approaches like homomorphic encryption and FSE secure data during communication but do not inherently detect malicious updates, leaving models vulnerable to model poisoning. IDS-based solutions focus on anomaly detection but face challenges in scalability and false alarms in highly distributed healthcare environments. Blockchain-enhanced systems improve auditability and decentralization but often introduce high computational and communication overhead. Furthermore, many proposed frameworks are evaluated on limited datasets or focus primarily on accuracy, with less emphasis on robustness against adaptive adversaries or combined privacy–security trade-offs.

From this review, it is evident that while existing literature addresses either privacy (via encryption/FSE) or security (via IDS/blockchain), very few frameworks offer a comprehensive and lightweight defense mechanism that jointly ensures secure sharing of updates and real-time detection of adversarial behaviors in FL for healthcare applications. This gap motivates our research, where we propose an integrated approach combining FSE for privacy-preserving updates with a global IDS for anomaly detection, thereby enhancing the trustworthiness of FL in sensitive healthcare settings.

Table 1 provides a summary of the existing state-of-the-art in FL for healthcare applications, highlighting their contribution, limitations, technologies used, comparison parameters, and security concerns/attacks discussed.

Table 1
www.frontiersin.org

Table 1. Review of existing research in privacy-preserving Federated Learning.

3 Proposed architecture

This section covers the discussion on FL and proposed architecture along with security mechanisms, i.e, FSE and IDS, in separate subsections.

3.1 Overview of the architecture

Figure 1 illustrates a Federated Learning (FL) framework used in healthcare facilities, having an IDS and FSE to guarantee security throughout the communication and learning process. The process/ steps of the proposed work, as shown in Figure 1 are as follows:

Local model training: Each medical facility (e.g., Healthcare Institute 1, 2, 3,... N) uses the infrastructure of the organization to process its local dataset and train a machine learning model. This guarantees the confidentiality of the patient's information. The training procedure closely complies with privacy-protecting guidelines.

Local model sharing with FSE: After training, updates to the local model are encrypted, then sent to the central server via FSE. By preventing unwanted access or tampering, this encryption guarantees that the model updates remain secure while in transit. Malicious local nodes trying to deduce private information during communication are another risk that the FSE reduces.

Global model aggregation: To create a global model, the central server gathers the encrypted weights that are received from each participating local node and decrypts them. The central server accurately aggregates the contributions of local nodes without introducing any malicious activity because it is presumed to be non-malicious.

Global IDS monitoring: An IDS at the central server is used to keep an eye out for irregularities in decrypted model updates, even though the server is reliable. To make sure they don't have a detrimental effect on the global model, the IDS detects and flags suspicious updates coming from potentially malicious local nodes, such as those with extreme model parameter deviations.

Global Model Distribution: Following aggregation, each local healthcare facility receives a copy of the global model. To increase the precision of its forecasts, every institution makes use of the recent global model.

Figure 1
Diagram illustrating a centralized server receiving encrypted model updates from multiple healthcare institutes. Each institute shares encrypted weights with a central server. Secure Multi-Party Computation (SMPC) is involved in encrypting updates. The central server decrypts, aggregates, and monitors updates for anomalies, ensuring data integrity. Icons represent local data, training data, and hospitals. Annotations clarify the steps in the data-sharing process.

Figure 1. Federated Learning architecture. (i) Local Nodes train models on their on data. (ii) Secure Multi-Part Computation encrypts the model updates before transmitting them to the central server. (iii) Local Nodes send their encrypted model updates to the central server. (iv) The central server decrypts and aggregates the updates to update the global model. (v) The central server uses its IDS to monitor for any abnormalies in the decrypted updates, ensuring integrity against malicious contributions.

The proposed architecture ensures that malicious activity coming from local nodes is identified and stopped before it can compromise the integrity of the global model by combining FSE for secure communication with an IDS for anomaly detection.

3.2 Working of federated learning framework

A decentralized machine learning technique called federated learning allows several devices or organizations (Oh and Nadkarni, 2023) to work together to train a model without exchanging raw data. Multiple local nodes and a global node make up an FL's two ends, with the client servers keeping their local data and the central server maintaining the global model (Islam et al., 2023). Each client uses its data in the paradigm to train the model locally; only the central server receives the model weights for aggregation. To enhance the global model, which makes use of insights from all participating clients, the central server gathers these weights. Particularly useful in healthcare applications where patient data must stay within the borders of each institution, this decentralized approach guarantees privacy preservation by storing sensitive data on client devices and lowering data transfer risks.

The aggregated global model weights are calculated using Equation 1, which represents the federated averaging mechanism:

Wglobal=1Ni=1NWlocal,i    (1)

where Wglobal is the global model weight, Wlocal represents the local weights of client i, and N is the total number of clients. This equation ensures that each client's contribution is equally weighted in the global model, providing a democratic aggregation approach where no single client dominates the learning process.

3.3 Secure transmission of model updates using symmetric encryption

FSE is a cryptographic technique that enables a node to authenticate and encrypt messages between parties (Sadu, 2024). In the context of Federated Learning, FSE is used to protect local model weights during transmission from clients to the central server. Instead of sending raw weight updates, which may leak sensitive information about patient data, each client encrypts its model parameters before sharing them.

Mathematically, the process can be described as follows. For a given client i, the local model weights Wlocal, i are encrypted before transmission:

Wencrypted=Encrypt(Wlocal,KFSE)    (2)

where KFSE is the secret encryption key (or a set of keys, in the case of threshold cryptography). This transformation ensures that even if an adversary intercepts the communication channel, the transmitted weights are unintelligible.

At the server side, decryption is performed to recover the original updates:

Wdecrypted=Decrypt(Wencrypted,KFSE)    (3)

This allows the central server to aggregate weights securely while ensuring that no raw data is ever exposed.

3.3.1 Key properties and guarantees

The use of FSE in our framework provides several important guarantees:

Confidentiality: Local model updates remain private during transmission, preventing leakage of patient-level data.

Collusion resistance: Even if multiple clients collude, they cannot recover another client's raw data, as only encrypted updates are visible in transit.

Integrity of transmission: By coupling encryption with authentication tags (e.g., Fernet symmetric encryption), tampering with updates can be detected.

3.3.2 Implementation considerations

In our implementation, the FSE scheme was employed, which provides both confidentiality and authentication. Symmetric encryption is chosen due to its computational efficiency compared to homomorphic encryption, which, although more powerful, can introduce significant communication and processing overhead. The global server generates and securely distributes the shared encryption key KFSE to each participating client during initialisation, ensuring that all parties can participate in secure encryption and decryption.

While FSE secures communication channels, it does not by itself detect malicious updates (e.g., model poisoning). This limitation justifies the complementary inclusion of the IDS at the global server, which inspects decrypted weights for anomalous behavior. Together, FSE and IDS provide both confidentiality and integrity for secure federated learning in healthcare.

3.4 Intrusion detection system

An IDS is a security tool that monitors and analyzes system activity to detect suspicious behavior, unauthorized access, or cyberattacks (Mosaiyebzadeh et al., 2023). Acting as an alarm system, it alerts system administrators to anomalies or malicious activity within the system. In the context of FL, the IDS safeguards the training process by detecting malicious or unusual behavior. The global server employs an IDS to monitor incoming client model updates. Using anomaly detection techniques, it identifies inconsistencies—such as significant deviations in model parameters—that may indicate malicious activity. To prevent compromised models from being incorporated into the global model, the server rejects any updates flagged as anomalous.

The anomaly detection technique used here checks for unusual changes in the model's weights as defined in Equation 4:

Anomaly Detected if ||wi-wt||>δ    (4)

In Equation 4, wi represents the weight updates from client i, wt is the current global model weights, and δ is the predefined threshold. When the Euclidean norm of the difference exceeds this threshold, the system flags the update as potentially malicious.

3.4.1 Threshold selection

The threshold value δ plays a critical role in balancing sensitivity and false alarms. In our experiments, δ was set empirically based on the distribution of update magnitudes across clients, with values chosen around the 95th percentile of observed deviations during benign training. This ensures that natural update variations are tolerated, while extreme deviations are flagged as anomalous. In practical deployments, δ can be dynamically adapted using validation rounds or statistical confidence intervals, making the IDS adaptable to different datasets and model architectures.

3.4.2 Need for IDS alongside FSE

Although FSE encrypts the data during transmission to guarantee the privacy and confidentiality of the model weights, it lacks a way to ensure the data's integrity. Malicious updates that adhere to the encryption scheme but are intended to undermine the global model can still be attempted by adversaries. IDS is responsible for identifying such malicious activity by examining the encrypted model updates for patterns. By examining system behavior and contrasting it with a baseline of typical activity, an anomaly-based IDS can detect possible threats (Schneble, 2018). By concentrating on departures from the standard, it can identify zero-day or previously unidentified attacks. This method, in contrast to signature-based IDS is not restricted to known threats and can adjust to changing security issues. However, if normal activity patterns are not precisely defined, it might produce false positives.

4 Implementation

4.1 Dataset overview

The Lung Cancer Risk Detection (Biswas and Nath, 2024) dataset is used for proposed work. It provides a comprehensive collection of data for examining various risk factors associated with lung cancer. It consists of 3000 rows and 16 columns, capturing multiple patient attributes. Key features include GENDER, AGE, SMOKING, ANXIETY, SHORTNESS_OF_BREATH, YELLOW_FINGERS, ALLERGY, ALCOHOL_CONSUMING, COUGHING, CHEST_PAIN. A summary of the dataset is presented in Table 2.

Table 2
www.frontiersin.org

Table 2. Summary of the lung cancer risk detection dataset.

4.2 Local model discription

In the FL environment, each participating client—such as hospitals or diagnostic facilities—trains a local model on its private dataset without disclosing sensitive patient information. The proposed work employs Machine Learning(ML) models as local models to predict lung cancer risk, ensuring both data privacy and predictive accuracy. ML is preferred over Deep Learning(DL) since the dataset is relatively small (3,000 records with 16 features), where DL models are prone to overfitting, require higher computational resources, and offer limited performance improvements. In contrast, ML is better suited for structured tabular data, computationally efficient, and provides interpretable results, which is essential in healthcare. Each client independently trains its model on local data, and the learned parameters are aggregated at the central server to build a robust global model. Specifically, Random Forest(RF), Support Vector Classifier(SVC), and Logistic Regression (LR) are used in the local training phase, with an ensemble approach to combine their predictions, thereby improving accuracy, generalizability, and robustness against non-Independent and Identically Distributed (IID) data distributions.

All ML models were trained with model-specific hyperparameters. For RF, the number of estimators was set to 100 with Gini impurity as the split criterion. For SVC, an RBF kernel was used with C = 1.0 and γ = scale. For LR, the solver was set to “liblinear” with L2 regularization and a maximum of 1,000 iterations. These hyperparameters were selected through preliminary tuning to balance training efficiency and predictive accuracy across clients.

The following subsections describe these classifiers in detail and their role in the federated setup.

4.2.1 Random forest

An ensemble learning technique called Random Forest builds several decision trees during training and produces a class that is the average of the classes of the individual trees. The Random Forest model will use 16 features in the dataset to produce a strong predictive model for lung cancer risk detection as shown in Figure 2.

Figure 2
Flowchart illustrating a Random Forest model process. Training data is divided into bootstrap samples, creating multiple decision trees. Outputs from these trees undergo majority voting to produce the final prediction.

Figure 2. Random forest architecture.

4.2.2 Support vector classifier

The goal of the Support Vector Classifier (SVC) is to identify the best hyperplane in the feature space for dividing the various classes. The SVC will attempt to maximize the margin between the classes by mapping the 16 input features into a high-dimensional space in the context of lung cancer risk detection as shown in Figure 3.

Figure 3
Flowchart illustrating the SVM process: Training data undergoes kernel transformation, followed by optimization. This leads to the identification of support vectors, creation of a decision boundary, and results in prediction.

Figure 3. Support vector classifier architecture.

4.2.3 Logistic regression

A statistical model called logistic regression models a binary dependent variable using a logistic function. Based on the input features, Logistic Regression will calculate the likelihood of lung cancer in the context of lung cancer risk detection. Maximum likelihood estimation is used to train the model, and the result is a probability score that can be thresholded to classify patients as either low-risk or high-risk. The architecture of logistic regression is depicted in Figure 4.

Figure 4
Flowchart depicting a machine learning process. Training data undergoes feature processing, followed by weight initialization, application of the sigmoid function, optimization, and concludes with a probability output. Arrows indicate the sequence.

Figure 4. Logistic regression architecture.

4.2.4 Ensemble approach

To enhance overall predictive performance, the ensemble approach integrates predictions from several models, such as SVC, RF, and LR. The architecture of the ensemble approach is shown in Figure 5.

Figure 5
Flowchart illustrating a machine learning workflow. Training data undergoes processing, feeding into three models: Random Forest, SVC, and Logistic Regression. Outputs are evaluated by performance metrics, guiding best model selection, leading to final prediction.

Figure 5. Ensemble model.

4.3 Algorithms

The local training process follows the mathematical framework established in Equations 1, 2, where encrypted local weights are securely transmitted for global aggregation according to the federated averaging principle.

4.3.1 Node i: Local training and secure weight sharing

The Algorithm 1 describes the role of the local node in federated learning. The objective is to useFernet Symmetric Encryption (FSE) to ensure secure weight sharing while training the local model with the node's private dataset. The following are the steps:

Initialization: The node sets up the FSE scheme (FSEi) for encrypting model updates, its local dataset (Di), and its local model (Mi).

Local training: The node uses its private dataset (Di) to train its model (Mi) per training round.

Computation of updates: The node calculates its local model updates (Wi) following training.

Secure encryption The FSE is used to encrypt the local updates.

Weight sharing: The global server receives the encrypted model updates (encrypted_Wi) for aggregation.

Termination: Until the local model converges or the maximum number of training rounds is reached, the process repeats.

Algorithm 1
www.frontiersin.org

Algorithm 1. Node i: Local training and secure weight sharing.

4.3.2 Global server: secure aggregation and anomaly detection

The Algorithm 2 describes the actions taken by the global server. The server's functions include coordinating the iterative enhancement of the global model, detecting anomalies at the global level, and aggregating securely encrypted model weights from several nodes. The following are the steps.

Initialization: Initialization is done for a secure FSE (FSEc) for secure aggregation, a global model (Mc), and a global IDS (IDSc) for anomaly detection.

Parameter setup: Important parameters are specified, including the convergence threshold, maximum rounds, and performance metrics.

Receiving encrypted updates: All participating local nodes send encrypted model updates (encrypted_Wi) to the server.

Anomaly detection: The received encrypted updates are monitored by the global IDS (IDSc) for any possible irregularities. The malicious updates are removed if anomalies are found.

Decryption and aggregation: The server uses the FSE (FSEc) to decrypt the updates and aggregates them to Mc if no anomalies are found.

Convergence evaluation: The server compares the global model's performance metrics to a predetermined threshold to assess the convergence of the model.

Final model distribution: All participating local nodes receive access to the final global model after convergence or the maximum number of rounds is reached.

Algorithm 2
www.frontiersin.org

Algorithm 2. Global server: secure aggregation and anomaly detection.

5 Security analysis

The security of our framework is mathematically grounded in the encryption-decryption pair defined by Equations 2, 3, combined with the anomaly detection mechanism specified in Equation 4. This mathematical foundation provides formal security guarantees for the federated learning process. Secure Multi-party Computation (FSE) and Intrusion Detection System (IDS), the two main security elements included in the Federated Learning framework, are thoroughly examined in this section. Together, these elements form a strong security framework that safeguards the model aggregation procedure as well as the communication channels. The proposed technique is evaluated against existing schemes with respect to security properties and various attacks, as shown in Tables 3, 4.

Table 3
www.frontiersin.org

Table 3. Comparison of attack detection capabilities.

Table 4
www.frontiersin.org

Table 4. Comparison of security properties.

5.1 FSE implementation analysis

The implementation utilizes the Fernet symmetric encryption from the cryptography library to secure weight transmission between local nodes and the global server. The implementation centers around a secure key generation process at the global server level, which establishes the foundation for all subsequent encryption operations. During the training process, local weights are carefully serialized and encrypted before transmission, ensuring that sensitive model parameters remain protected during transit. The global server then performs secure decryption before weight aggregation, maintaining the confidentiality of the entire process. The FSE implementation provides significant security benefits in terms of both confidentiality protection and communication security. From a confidentiality perspective, the encryption of weights during transit effectively prevents unauthorized access to model parameters. The Fernet implementation provides strong cryptographic guarantees, ensuring that even if the communication channel is compromised, the encrypted weights remain secure. This protection extends to preventing weight inference attacks, where adversaries might attempt to reconstruct training data from model parameters.

Communication security is enhanced through multiple mechanisms. The implementation effectively mitigates man-in-the-middle attacks by ensuring that all transmitted data is encrypted with keys known only to authorized participants. The secure weights-sharing scheme enables distributed nodes to collaborate safely, while the encryption scheme preserves data privacy throughout the learning process. This comprehensive approach to communication security ensures that the federated learning system can operate effectively even in potentially hostile network environments.

5.1.1 Verification results

The verification process focused on critical security properties such as confidentiality, authentication, and liveness between the participating entities, namely the Healthcare Institutions (clients) and the Central Server (aggregator).

Confidentiality (Secret): The Scyther tool (Cremers, 2008) confirmed that the uniqueTransactionId shared between clients and the server remains confidential, ensuring no leakage of sensitive information.

Authentication (Nisynch and Alive):

- Nisynch (Non-injective Synchronization): Verified that if two parties believe they have completed a session, then the session indeed took place.

- Alive: Verified that both communicating parties were active during the communication.

Authentication was successfully verified for both Healthcare Institutions and the Central Server, ensuring mutual agreement and trust in the communication sessions.

Figure 6 presents the verification results showing that all claims have been successfully verified without any detected attacks.

Figure 6
Results screen from Scyther showing various claims and their statuses. Claims include SecureFedL with HealthcareInst and CentralServer connections. All claims are marked “OK” and “Verified,” indicating no attacks detected.

Figure 6. Scyther verification results for the SecureFedL.

5.1.2 Characterization results

The characterization analysis performed by Scyther further confirmed the correctness of the scheme's execution flow. It identified exactly three valid trace patterns for interactions between:

• SecureFL and Healthcare Institutions 2

• SecureFL and Central Server 2

This indicates that the Secure FL adheres to its intended behavior under different communication scenarios, enhancing its reliability.

The characterization results are shown in Figure 7.

Figure 7
Scyther results interface displaying a table with columns for Claim, Status, Comments, and Patterns. Claims include “SecureFedL, HealthcareInst” and “CentralServer.” Both are marked as “Reachable” with “OK” status, verified with comments noting “Exactly 3 trace patterns.” Patterns are indicated with clickable “3 trace patterns” buttons. The process is marked as done.

Figure 7. Scyther characterization results for the SecureFedL.

5.1.3 Summary

The results from Scyther tool analysis demonstrate that the SecureFedL successfully upholds the required security properties:

• Confidentiality of sensitive data

• Authentication and liveness of participants

• Correct execution flow through trace characterization

Thus, our scheme is formally verified to be secure against standard threat models and provides a reliable foundation for secure federated learning applications in sensitive domains such as healthcare.

5.2 IDS implementation analysis

The IDS implementation uses a sophisticated detect_anomalies() method to detect anomalies in weight updates. By keeping an eye on and evaluating incoming weight updates for possible security threats, this system acts as an essential second line of defense. Potential attacks can be quickly identified thanks to the implementation's use of threshold-based detection mechanisms to find suspicious patterns in the weight updates.

The IDS uses several important mechanisms to offer strong model protection. Throughout the training process, it preserves the integrity of the global model by avoiding the incorporation of poisoned updates. The system's continuous monitoring features greatly lower the chance of successful model poisoning attacks, and its automatic rejection of questionable updates contributes to the stability of the global model. The FL system is protected from numerous types of attacks thanks to this proactive security approach.

5.3 Dual security architecture analysis

The federated averaging process, as mathematically defined in Equation 1, was applied across all three client partitions to generate the global model performance metrics.

A particularly strong security framework that offers thorough protection across several federated learning system layers is produced by the combination of FSE and IDS. By implementing security at both the communication and aggregation layers, this dual approach builds a complementary system of security measures that greatly improves the learning process's overall protection.

The reduction of the attack surface is one of the main advantages of this architecture. The system significantly raises the barrier to entry for potential attackers by putting in place a variety of security checkpoints and defense mechanisms. This multi-layered security approach guarantees that other safeguards will continue to be in place to preserve system security even in the event that one security measure is compromised.

Table 5 highlights that while FSE or IDS alone address only subsets of attack vectors, their combination ensures confidentiality, integrity, and resilience against multiple threats simultaneously. This demonstrates that the dual-security framework provides superior guarantees beyond a straightforward additive benefit.

Table 5
www.frontiersin.org

Table 5. Comparative analysis of security mechanisms.

6 Implementation results and discussion

In this section, evaluation metrics and results that were obtained from the experiment conducted on the Lung Cancer Risk Detection dataset are presented. The models were trained using a federated learning framework, with secure weight sharing and aggregation as described in the previous sections. The model is evaluated based on the performance metrics. As the dataset contains 3,000 rows, it was divided into three parts of 1,000 rows each. Table 6 shows the results of three clients.

Table 6
www.frontiersin.org

Table 6. Model performance comparison across clients.

Figures 8, 9 show model comparison and evaluation metrics, respectively.

Figure 8
Bar chart comparing model values for Random Forest, Support Vector Classifier, and Logistic Regression. Metrics include Accuracy, Precision, Recall, F1 Score, AUC, and Log Loss. Each model has similar high scores across most metrics, with Log Loss slightly higher for Random Forest and Support Vector Classifier.

Figure 8. Comparison of model values.

Figure 9
Six line graphs compare Random Forest, Support Vector Classifier, and Logistic Regression across three clients. Metrics include Accuracy, Precision, Recall, F1 Score, AUC, and Log Loss, each showing varying trends for different models and clients.

Figure 9. Evaluation metrics comparison for Lung Cancer dataset.

The results demonstrate that our proposed dual-security federated learning framework consistently achieves high predictive performance while ensuring privacy and robustness. The ensemble approach achieved an accuracy of 99%, which is higher than most reported results in related works, such as Alazab et al. (2023) (98.07%) and Almalki et al. (2024) (93.89%). This indicates that our approach is competitive with, and in some cases outperforms, state-of-the-art FL models in healthcare.

Compared to existing literature that employed either FSE or IDS in isolation, our dual approach shows stronger resilience against poisoning and adversarial attacks. The ablation study (Table 7) confirms that IDS alone improves anomaly detection, and FSE alone ensures confidentiality, but the combined framework provides the most robust security without sacrificing model accuracy.

Table 7
www.frontiersin.org

Table 7. Ablation study: impact of FSE and IDS on model performance.

6.1 Validation with confidence intervals

To validate the robustness of the results, we calculated 95% confidence intervals (CI) for the key metrics across clients. The ensemble model maintained narrow confidence intervals around its mean accuracy and F1-score, confirming that its performance was consistently better than individual models (Random Forest, Logistic Regression, and SVC). This suggests that improvements are not dataset-split specific, but rather generalizable across clients.

6.2 Healthcare-specific adaptation

While FSE and IDS have been applied in other domains, our adaptation explicitly targets healthcare risks. Patient data is highly sensitive and often stored in fragmented silos across institutions. Our dual framework ensures that data confidentiality (through FSE) and integrity of model updates (through IDS) are simultaneously preserved, addressing specific threats such as data poisoning of Electronic Health Records (EHR) and adversarial manipulation of diagnostic predictions.

6.3 Justification of model choice

Although deep neural networks could potentially yield higher accuracy, they are computationally expensive and less interpretable. For healthcare, interpretability and efficiency are critical. Logistic Regression and Random Forest provide explainability for clinical decision-making, while SVC captures nonlinear relations. The ensemble leverages its complementary strengths, making it suitable for real-world healthcare deployments.

6.4 Limitations of proposed work

Despite promising results, this implementation has several limitations. First, the experiments were conducted on a single healthcare dataset of 3,000 records, distributed across three clients. Such a small-scale setup does not adequately represent the complexity and heterogeneity of real-world healthcare data, thereby limiting the generalizability of the findings. Moreover, the simple division of 3,000 samples into three equal parts does not reflect realistic federated learning scenarios, where data is typically non-IID (non-independently and identically distributed) across clients. The current federated configuration, restricted to three clients with approximately 1,000 samples each, is acknowledged as a simplification and does not fully comply with practical deployment conditions. Future work will extend the evaluation to more realistic environments with increased client participation, heterogeneous data distributions, and real-world constraints. Second, the intrusion detection mechanism relies on a fixed thresholding approach, which may lead to false positives. The evaluation also did not report detailed performance metrics such as true positives, false positives, detection latency, or precision–recall trade-offs, all of which are critical for assessing practical feasibility in healthcare environments. In addition, the anomaly detection strategy is based on a simple Euclidean norm threshold (||wiwt||>δ), which, while effective against extreme deviations, may generate false positives in federated settings where model updates naturally vary due to non-IID data distributions. Moreover, sophisticated adversarial threats such as gradient inversion, membership inference, and Byzantine behaviors are not explicitly addressed in the current implementation. These remain important open challenges, and extending the framework to incorporate adaptive thresholds, advanced defense mechanisms, and evaluations on larger, more diverse datasets is an essential direction for future work.

6.5 Computational overhead and scalability

A critical concern in federated healthcare applications is whether the proposed dual-security framework can scale across multiple institutions without excessive computational or communication costs.

In our implementation, the FSE employed lightweight symmetric encryption (Fernet). The encryption and decryption of weight vectors added less than 5% overhead to each training round, demonstrating practical feasibility even on modest client devices. IDS monitoring, which consists of anomaly checks based on Euclidean norms, introduced an additional overhead of less than 3%. Together, these operations contribute marginal latency while providing substantial security guarantees.

Regarding scalability, experiments with increasing numbers of simulated clients confirmed that overhead grows linearly with the number of participants. However, communication costs remain manageable, since only encrypted model weights—not raw data—are transmitted. The framework, therefore, supports deployment across large healthcare systems and can be further optimized using secure aggregation or a communication-efficient scheme in future work.

7 Conclusion and future work

A secure and privacy-preserving FL framework designed for healthcare applications is presented in this work, addressing the growing concerns of system robustness and data confidentiality. The suggested method fortifies the security of federated learning against both passive and active threats by incorporating Fernet Symmetric Encryption (FSE) for the encrypted exchange of model updates and setting up an Intrusion Detection System (IDS) at the central server.

The Lung Cancer Risk Detection dataset, which comprises a variety of characteristics like age, smoking habits, anxiety levels, and more, was subjected to experimental assessments. The findings show that the suggested framework protects data privacy while maintaining excellent model performance. The ensemble model consistently outperformed the other models—Logistic Regression, Random Forest, Support Vector Classifier, and an ensemble approach—achieving a peak accuracy of 99% across clients.

Additionally, by verifying crucial security attributes such as confidentiality, authentication, and appropriate synchronization, formal security verification using the Scyther tool confirmed the framework's resilience. The accuracy of FSE-scheme executions was also demonstrated by the characterization results, confirming the system's dependability in practical applications. Future work will focus on implementing deep learning models across multiple datasets, integrating them for analyzing their impact on results, and enhancing IDS through adaptive anomaly detection techniques.

Data availability statement

Publicly available datasets were analyzed in this study. This data can be found here: https://www.kaggle.com/dsv/8795028.

Author contributions

BS: Conceptualization, Methodology, Software, Investigation, Formal analysis, Writing – original draft, Writing – review & editing. JG: Software, Data curation, Investigation, Visualization, Writing – review & editing. SR: Conceptualization, Supervision, Validation, Writing – review & editing. SP: Methodology, Formal analysis, Resources, Writing – review & editing. KP: Project administration, Investigation, Data curation, Writing – review & editing. RN: Supervision, Validation, Resources, Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Gen AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fdata.2025.1659026/full#supplementary-material

Abbreviations

Di, Private dataset of node I; Mi, Local model of node I; FSEi, FSE instance of node I; Wi, Local model weights of node I; enc_Wi, Encrypted weights of node I; Mc, Global model of central server; IDSc, IDS instance of central server; FSEc, FSE instance of central server; con_t, Convergence threshold; max_rounds, Maximum training rounds; current_round, Current round number; enc_weights, Collection of encrypted weights; anomaliesc, Detected anomalies at server; dec_weights, Decrypted weights at server.

References

Abaoud, M., Almuqrin, M. A., and Khan, M. F. (2023). Advancing federated learning through novel mechanism for privacy preservation in healthcare applications. IEEE Access 11, 83562–83579. doi: 10.1109/ACCESS.2023.3301162

Crossref Full Text | Google Scholar

Alazab, A., Khraisat, A., Singh, S., and Jan, T. (2023). Enhancing privacy-preserving intrusion detection through federated learning. Electronics 12:3382. doi: 10.3390/electronics12163382

Crossref Full Text | Google Scholar

Ali, M. S., Ahsan, M. M., Tasnim, L., Afrin, S., Biswas, K., Hossain, M. M., et al. (2024). Federated learning in healthcare: model misconducts, security, challenges, applications, and future research directions-a systematic review. arXiv preprint arXiv:2405.13832. doi: 10.48550/arXiv.2405.13832

Crossref Full Text | Google Scholar

Almalawi, A., Khan, A. I., Alsolami, F., Abushark, Y. B., and Alfakeeh, A. S. (2023). Managing security of healthcare data for a modern healthcare system. Sensors 23:3612. doi: 10.3390/s23073612

PubMed Abstract | Crossref Full Text | Google Scholar

Almalki, J., Alshahrani, S. M., and Khan, N. A. (2024). A comprehensive secure system enabling healthcare 5.0 using federated learning, intrusion detection and blockchain. Peer J. Comput. Sci. 10:e1778. doi: 10.7717/peerj-cs.1778

PubMed Abstract | Crossref Full Text | Google Scholar

Biswas, M. A., and Nath, M. A. (2024). Lung Cancer Dataset.

Google Scholar

Chaddad, A., Wu, Y., and Desrosiers, C. (2023). Federated learning for healthcare applications. IEEE Internet Things J. 11, 7339–7358. doi: 10.1109/JIOT.2023.3325822

Crossref Full Text | Google Scholar

Chen, J., Xue, J., Wang, Y., Huang, L., Baker, T., and Zhou, Z. (2023). Privacy-preserving and traceable federated learning for data sharing in industrial iot applications. Expert Syst. Applic. 213:119036. doi: 10.1016/j.eswa.2022.119036

Crossref Full Text | Google Scholar

Coelho, K. K., Nogueira, M., Vieira, A. B., Silva, E. F., and Nacif, J. A. M. (2023). A survey on federated learning for security and privacy in healthcare applications. Comput. Communic. 207, 113–127. doi: 10.1016/j.comcom.2023.05.012

Crossref Full Text | Google Scholar

Cremers, C. J. (2008). “The scyther tool: verification, falsification, and analysis of security protocols: Tool paper,” in International Conference on Computer Aided Verification (Springer), 414–418. doi: 10.1007/978-3-540-70545-1_38

Crossref Full Text | Google Scholar

Gartner (2025). Available online at: https://www.gartner.com/en/documents/4333599 (Accessed January 31, 2025).

Google Scholar

Gayathri Hegde, M., Shenoy, P. D., and Venugopal, K. R. (2023). “A comparative study of neural network and machine learning on privacy preserving federated learning for healthcare applications,” in 2023 IEEE Technology &Engineering Management Conference - Asia Pacific (TEMSCON-ASPAC), 1–6. doi: 10.1109/TEMSCON-ASPAC59527.2023.10531360

Crossref Full Text | Google Scholar

Guduri, M., Chakraborty, C., Margala, M., and Margal, M. (2023). Blockchain-based federated learning technique for privacy preservation and security of smart electronic health records. IEEE Trans. Cons. Electr. 70, 2608–2617. doi: 10.1109/TCE.2023.3315415

Crossref Full Text | Google Scholar

Hiwale, M., Walambe, R., Potdar, V., and Kotecha, K. (2023). A systematic review of privacy-preserving methods deployed with blockchain and federated learning for the telemedicine. Healthc. Anal. 3:100192. doi: 10.1016/j.health.2023.100192

PubMed Abstract | Crossref Full Text | Google Scholar

Islam, M., Reza, M. T., Kaosar, M., and Parvez, M. Z. (2023). Effectiveness of federated learning and cnn ensemble architectures for identifying brain tumors using mri images. Neural Process. Lett. 55, 3779–3809. doi: 10.1007/s11063-022-11014-1

PubMed Abstract | Crossref Full Text | Google Scholar

Joshi, M., Pal, A., and Sankarasubbu, M. (2022). Federated learning for healthcare domain-pipeline, applications and challenges. ACM Trans. Comput. Healthc. 3, 1–36. doi: 10.1145/3533708

Crossref Full Text | Google Scholar

Khatun, M. A., Memon, S. F., Eising, C., and Dhirani, L. L. (2023). Machine learning for healthcare-iot security: a review and risk mitigation. IEEE Access 11, 145869–145896. doi: 10.1109/ACCESS.2023.3346320

Crossref Full Text | Google Scholar

Kumar, Y., and Singla, R. (2021). “Federated learning systems for healthcare: perspective and recent progress,” in Federated Learning Systems: Towards Next-Generation AI, 141–156. doi: 10.1007/978-3-030-70604-3_6

Crossref Full Text | Google Scholar

Li, H., Li, C., Wang, J., Yang, A., Ma, Z., Zhang, Z., et al. (2023). Review on security of federated learning and its application in healthcare. Fut. Gen. Comput. Syst. 144, 271–290. doi: 10.1016/j.future.2023.02.021

Crossref Full Text | Google Scholar

Mosaiyebzadeh, F., Pouriyeh, S., Parizi, R. M., Han, M., and Batista, D. M. (2023). “Intrusion detection system for ioht devices using federated learning,” in IEEE INFOCOM 2023 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), 1–6. doi: 10.1109/INFOCOMWKSHPS57453.2023.10225932

Crossref Full Text | Google Scholar

Naresh, V. S., and Thamarai, M. (2023). Privacy-preserving data mining and machine learning in healthcare: applications, challenges, and solutions. Wiley Interdiscipl. Rev.: Data Min. Knowl. Discov. 13:e1490. doi: 10.1002/widm.1490

Crossref Full Text | Google Scholar

Oh, W., and Nadkarni, G. N. (2023). Federated learning in health care using structured medical data. Adv. Kidney Disease Health 30, 4–16. doi: 10.1053/j.akdh.2022.11.007

PubMed Abstract | Crossref Full Text | Google Scholar

Otoum, S., Guizani, N., and Mouftah, H. (2021). “Federated reinforcement learning-supported ids for iot-steered healthcare systems,” in ICC 2021-IEEE International Conference on Communications (IEEE), 1–6. doi: 10.1109/ICC42927.2021.9500698

Crossref Full Text | Google Scholar

Qayyum, A., Janjua, M. U., and Qadir, J. (2022). Making federated learning robust to adversarial attacks by learning data and model association. Comput. Sec. 121:102827. doi: 10.1016/j.cose.2022.102827

Crossref Full Text | Google Scholar

Sadu, M. (2024). Hybrid encryption of fernet and initialisation vector with attribute-based encryption: a secure and flexible approach for data protection. Int. J. Big Data Intell. 8, 137–149. doi: 10.1504/IJBDI.2024.138940

Crossref Full Text | Google Scholar

Schneble, W. (2018). Federated learning for intrusion detection systems in medical cyber-physical systems. PhD thesis.

Google Scholar

Shen, G., Fu, Z., Gui, Y., Susilo, W., and Zhang, M. (2023). Efficient and privacy-preserving online diagnosis scheme based on federated learning in e-healthcare system. Inform. Sci. 647:119261. doi: 10.1016/j.ins.2023.119261

Crossref Full Text | Google Scholar

Srivenkateswaran, C., Jaya Mabel Rani, A., Senthil Kumaran, R., and Vinston Raja, R. (2025). Securing healthcare data: a federated learning framework with hybrid encryption in cluster environments. Technol. Health Care 33, 1232–1257. doi: 10.1177/09287329241291397

PubMed Abstract | Crossref Full Text | Google Scholar

Wang, B., Li, H., Guo, Y., and Wang, J. (2023a). Ppflhe: a privacy-preserving federated learning scheme with homomorphic encryption for healthcare data. Appl. Soft Comput. 146:110677. doi: 10.1016/j.asoc.2023.110677

Crossref Full Text | Google Scholar

Wang, W., Li, X., Qiu, X., Zhang, X., Brusic, V., and Zhao, J. (2023b). A privacy-preserving framework for federated learning in smart healthcare systems. Inform. Process. Manag. 60:103167. doi: 10.1016/j.ipm.2022.103167

Crossref Full Text | Google Scholar

Yazdinejad, A., Dehghantanha, A., Srivastava, G., Karimipour, H., and Parizi, R. M. (2024). Hybrid privacy preserving federated learning against irregular users in next-generation internet of things. J. Syst. Architect. 148:103088. doi: 10.1016/j.sysarc.2024.103088

Crossref Full Text | Google Scholar

Keywords: Federated Learning, Fernet Symmetric Encryption, Intrusion Detection System, Logistic Regression, Random Forest, Support Vector Classifier

Citation: Shrimali B, Gajjar J, Roy S, Patel S, Patel K and Naik RR (2026) EnDuSecFed: an ensemble approach for privacy preserving Federated Learning with dual-security framework for sustainable healthcare. Front. Big Data 8:1659026. doi: 10.3389/fdata.2025.1659026

Received: 03 July 2025; Accepted: 16 October 2025;
Published: 22 January 2026.

Edited by:

Riaz Ullah Khan, University of Electronic Science and Technology of China, China

Reviewed by:

Rajesh Kumar, University of Electronic Science and Technology of China, China
Amin Ul Haq, University of Electronic Science and Technology of China, China
Waqas Amin, Southwest University of Science and Technology, China

Copyright © 2026 Shrimali, Gajjar, Roy, Patel, Patel and Naik. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Bela Shrimali, YmVsYS5zaHJpbWFsaUBnbWFpbC5jb20=; Swapnoneel Roy, cy5yb3lAdW5mLmVkdQ==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.