- 1School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, China
- 2Tianjin Police Institute, Tianjin, China
Introduction: Federated learning (FL) is a distributed machine learning paradigm that preserves data privacy and mitigates data silos. Nevertheless, frequent communication between clients and the server often becomes a major bottleneck, restricting training efficiency and scalability.
Methods: To address this challenge, we propose a novel communication-efficient algorithm, EF-Feddr, for federated composite optimization, where the objective function includes a potentially non-smooth regularization term and local datasets are non-IID. Our method is built upon the relaxed Douglas–Rachford splitting method and incorporates error feedback (EF)—a widely adopted compression framework—to ensure convergence when biased compression (e.g., top-k sparsification) is applied.
Results: Under the partial client participation setting, our theoretical analysis demonstrates that EF-Feddr achieves a fast convergence rate of O(1/K) and a communication complexity of O(1/ε2). Comprehensive experiments conducted on the FEMNIST and Shakespeare benchmarks, as well as controlled synthetic data, consistently validate the efficacy of EF-Feddr across diverse scenarios.
Discussion: The results confirm that the integration of error feedback with the relaxed Douglas–Rachford splitting method in EF-Feddr effectively overcomes the convergence degradation typically caused by biased compression, thereby offering a practical and efficient solution for communication-constrained federated learning.
1 Introduction
Federated learning (FL) (Konecný et al., 2016; McMahan et al., 2017) is a distributed framework designed to address large-scale learning problems across networks of edge clients. In this paradigm, clients update models locally on their private data, while the server aggregates these updates to refine a shared global model. This collaborative process enables the development of global or personalized models without compromising user privacy (Ezequiel et al., 2022; Saifullah et al., 2024). Despite these advantages, communication between clients and the server remains a critical bottleneck, particularly when the number of participating clients is large, bandwidth is constrained, and the models involve high-dimensional parameters (Bhardwaj et al., 2023; Talwar et al., 2021). Recent efforts to improve the communication efficiency of FL have primarily focused on two directions: (i) reducing the number of communication rounds through partial client participation or increased local computation, and (ii) lowering the number of transmitted bits per round via techniques such as quantization and residual gradient compression. While these strategies effectively cut communication costs, they also introduce additional variance, which may widen the neighborhood around the optimal solution and, in some cases, prevent convergence under biased compression. To mitigate these issues, variance-reduction techniques such as error feedback (EF) are commonly employed. In contrast to traditional distributed training, it is unrealistic to assume that data on each local device are always independent and identically distributed (IID). Prior studies have consistently shown that FL accuracy degrades significantly when faced with non-IID or heterogeneous data (Islam et al., 2024). In this study, we focus on the following federated composite optimization (FCO) problem:
where n denotes the number of clients, fi is the local loss function for the i-th client, which is L-smooth and non-convex, and g represents the regularization term, which is proper, closed, convex (possibly non-smooth). As a practical example, consider a collaborative environmental monitoring project in which multiple research institutions aim to analyze sensor data from diverse geographical locations to detect climate change patterns. Due to privacy concerns and proprietary restrictions, however, raw data cannot be shared directly. In this case, enforcing sparse regularization becomes particularly important: although the dataset may contain relatively few observations (e.g., readings from a sparse sensor network Bhardwaj et al., 2022), each observation typically involves a high-dimensional set of features such as temperature, humidity, wind speed, and pollution levels, a combination of factors that further justifies the use of sparse regularization to identify salient features and prevent overfitting.
Operator splitting constitutes a broad class of methods for solving optimization problems of the form (Equation 1). These methods decompose numerically intractable components into simpler subproblems, thereby reducing computational complexity, enhancing efficiency, and enabling modular algorithms that are naturally suited for parallelization. Operator splitting has been successfully applied to a wide range of challenging optimization problems. Among these, the Douglas–Rachford splitting method is particularly well-established due to its enhanced iterative stability and accelerated convergence rate. Furthermore, its update rule decomposes the global composite objective into local proximal steps that can be executed in a fully parallel manner. This structure inherently aligns with the distributed nature of federated learning, facilitating efficient client-side computation while also underpinning the method's enhanced iterative stability. From this perspective, many state-of-the-art FL algorithms can be interpreted within the operator splitting framework (Malekmohammadi et al., 2021). Examples include FedAvg (McMahan et al., 2017), FedProx (Li et al., 2020), FedSplit (Pathak and Wainwright, 2020), and FedDR (Tran-Dinh et al., 2021). However, for the FCO Equation 1, existing FL methods such as FedAvg and its communication-efficient variants are primarily designed for smooth, unconstrained settings . In non-smooth FL settings, subgradient methods are widely used but suffer from slow convergence (Jhunjhunwala et al., 2022). Although proximal operators offer a more effective alternative with superior convergence properties (Liu et al., 2024), their seamless integration into communication-efficient FL frameworks remains limited. Moreover, while compression techniques effectively reduce communication overhead, they introduce additional variance that can enlarge the solution neighborhood and hinder convergence. Critically, existing communication-efficient methods have predominantly been designed for smooth FL problems, leaving a pronounced combined gap in addressing non-smooth federated composite optimization under compression-induced variance and communication constraints simultaneously. To bridge this multifaceted gap, this study presents EF-Feddr, a communication-efficient FL algorithm that employs the Top-k sparsification technique to compress transmitted parameters and reduce communication bits, incorporates an error feedback (Li and Li, 2023) mechanism to mitigate variance introduced by compression, and further integrates the relaxed Douglas–Rachford splitting method (He et al., 2021) along with a proximal operator to accelerate the iterative process while effectively handling the non-smoothness of the global regularization term. This integrated design enables EF-Feddr to be applicable to a wider range of scenarios and constrained settings. Leveraging the Douglas–Rachford envelope, we establish convergence guarantees for EF-Feddr in non-convex FL problems under mild assumptions.
Our contributions are summarized as follows:
• We propose EF-Feddr, an algorithm that combines the relaxed Douglas–Rachford splitting method with error feedback to reduce communication costs between clients and the server without sacrificing accuracy in non-IID settings. In addition, the error feedback mechanism enhances the stability of communication-compressed training in FL.
• We establish theoretical convergence guarantees for EF-Feddr based on the Douglas–Rachford envelope. Specifically, our method achieves a convergence rate of and a communication complexity of for non-convex loss functions under partial client participation.
• Through experiments on synthetic datasets, the FEMNIST dataset, and the Shakespeare dataset, we show that EF-Feddr improves accuracy by 3.29%–12.97% over state-of-the-art FL variants, while significantly reducing communication costs compared to uncompressed FedDR.
2 Related work
2.1 Operator splitting methods
Classical operator splitting methods such as Douglas–Rachford (DR), Forward-Backward (FB), and the Alternating Direction Method of Multipliers (ADMM) have recently been adopted in FL (Godavarthi et al., 2025; Goel et al., 2025). FedAvg (McMahan et al., 2017) can be viewed as an instance of k-step FB splitting, while FedProx (Li et al., 2020) extends the backward-backward splitting method. It is another FB variant tailored for regularized FL problems. FedSplit (Pathak and Wainwright, 2020), based on Peaceman-Rachford splitting, aims to identify the correct fixed point for strictly convex FL problems. Its communication-efficient variant, Eco-FedSplit (Khirirat et al., 2022), incorporates error-compensated compression. For the FCO problem, FedDR (Tran-Dinh et al., 2021) integrates a randomized block-coordinate strategy with DR splitting to solve non-convex formulations. FedADMM (Wang et al., 2022) leverages ADMM by applying FedDR to the dual form of the FCO problem, while FedTOP-ADMM (Kant et al., 2022) generalizes FedADMM as the first three-operator method used in FL.
2.2 Communication-efficient FL
To address the communication bottleneck in FL (Sun et al., 2024), two categories of compression methods have been widely explored: unbiased compressors (e.g., stochastic quantization Alistarh et al., 2017) and biased compressors (e.g., top-k sparsification Khirirat et al., 2018). FedPAQ (Reisizadeh et al., 2020) reduces communication costs through periodic averaging, partial client participation, and quantization. However, this reduction comes at the expense of convergence accuracy, which requires additional training iterations. The authors also analyzed the trade-off between communication overhead and convergence in their experiments. The z-SignFedAvg algorithm (Tang et al., 2024), a variant of FedAvg, employs stochastic sign-based compression. It achieves accuracy comparable to uncompressed FedAvg while greatly reducing communication overhead. Building on the lazily aggregated gradient rule and error feedback, (Zhou et al., 2023) proposed two communication-efficient algorithms for non-convex FL: EF-LAG and BiEF-LAG, which adapt both uplink and downlink communications. Similarly, FedSQ (Long et al., 2024) introduces a hybrid approach combining sparsity and quantization to reduce communication costs while enhancing convergence.
2.3 Error feedback
In the realm of distributed optimization, it has been noted that employing biased compressors for direct updates may decelerate convergence, deteriorate generalization performance, or even induce divergence (Li and Li, 2023). To counteract these issues, error feedback techniques have been introduced, which can reduce the compression error compared to direct compression. The study (Seide et al., 2014) first proposed this method as a heuristic approach, which is inspired by the idea of Sigma-Delta modulation. EF21 (Richtárik et al., 2021) removes strict assumptions such as bounded gradients and bounded dissimilarity, and can handle arbitrary data heterogeneity among clients, but leads to worse computational complexity. EFSkip (Bao et al., 2025) allows arbitrary data heterogeneity and enjoys linear speedup for significantly improving upon previous results.
3 Compressed non-convex FL with error feedback
In this section, we present EF-Feddr, an algorithm that integrates error feedback into the relaxed Douglas–Rachford splitting framework to address the non-convex FCO problem. We begin with a brief introduction to the Douglas–Rachford splitting method, followed by an explanation of how error feedback is incorporated to improve communication efficiency. We then provide the detailed formulation of EF-Feddr and analyze its convergence properties. Main notations are listed in Table 1.
3.1 Problem formulation
The FCO Equation 1 is mathematically equivalent to the consensus optimization problem
where the consensus constraint set is E = {x = (x1, …, xn)|x1 = x2 = ⋯ = xn.}. Let lE be the indicator function of E. With the indicator function, one can treat the constrained problem as unconstrained by moving the constraints into the objective function. Then Equation 1 is obviously equivalent to
The first-order optimality condition is given by 0∈∇f(x)+∂g(x)+∂lE(x), where ∇f(x) = [∇f1(x1), ..., ∇fn(xn)]. A point x* is a stationary point to Equation 1, if . Additionally, the operator splitting method encompasses a broad range of techniques to effectively address this Equation 3. A key advantage of operator splitting methods is their efficient per-iteration operations, which makes them particularly suitable for large-scale applications due to their lower computational costs (He et al., 2021), among which the DR splitting method is particularly well-known. The iteration equations for the DR splitting method are given by
Given that the DR splitting method often demonstrates favorable and stable convergence behavior in practice, we base our approach on its relaxed variant to solve Equation 1. The detailed application is presented in Section 3.3.
For convenience, we introduce the definitions of the key concepts that will be utilized. For a function f, the proximal operator at point x with a step size γ>0 is
the Moreau envelope of f with a step size γ>0 is
the gradient mapping of f at point x with a step size γ>0 is
We observe that ∇Mγf(x) = Gγf(x) (Liu et al., 2019). Moreover, the proximal operator update can be written as
This representation reveals that the proximal operator update is analogous to taking a gradient step applied to the gradient mapping of f. For the composite function F(x) = f(x)+g(x), the corresponding gradient mapping is given by
In the context of general non-convex non-smooth problems, the gradient mapping is commonly used to assess convergence (Liu et al., 2024). Specifically, of Equation 1 is equivalent to .
3.2 Error feedback
We now define a general class of compressors that will be used throughout this study
Definition 1. (Absolute compressor). A map C:ℝd → ℝd is an absolute compressor operator if there exists ν>0 such that, ∀x∈ℝd, E||x−C(x)||2 ≤ ν2.
Most popular compressors such as the sign compression (Bernstein et al., 2018), the Top-k sparsification (Khirirat et al., 2018) and the sparsification together with quantization (Alistarh et al., 2017) are in fact absolute compressors if the full-precision vector has a bounded norm (Khirirat et al., 2022; Sahu et al., 2021).
Error feedback (also known as error compensation) is a popular tool in FL to reduce compression error and improve convergence speed compared to direct compression (Valdeira et al., 2025). Its mechanism shares a fundamental principle with Sigma-Delta modulation in signal processing (Seide et al., 2014). Technically, when transmitting a sequence of vectors, the method incorporates an auxiliary vector that accumulates the compression error at each step. This accumulated error is then added to the current vector before it undergoes compression and transmission (Karimireddy et al., 2019). More specifically, based on the DR splitting method (Equation 4), the update steps of the direct compression scheme are as follows:
the update steps with error feedback compression are as follows:
In direct compression, each vector 2zk+1−yk+1 is individually compressed, and the receiver directly uses its compressed version C(2zk+1−yk+1) in place of the original. Conversely, error feedback compression employs a proxy vector ck+1 for 2zk+1−yk+1 that integrates information from prior steps 0, 1, …, k. This proxy is refined via an auxiliary vector ek+1, which is iteratively updated and stored to accumulate the compression error at each step.
3.3 EF-Feddr algorithm
In this section, we propose the following EF-Feddr algorithm. The details of EF-Feddr are presented in Algorithm 1. Specifically, applying the relaxed DR splitting method (He et al., 2021) to the Equation 3 of Equation 1 in a distributed setting yields the following iterative steps:
By integrating the error feedback mechanism detailed in Section 3.2, we obtain the EF-Feddr iterative scheme:
where λ∈(0, 2) (He et al., 2021) is the relaxation parameter. The variables , , and are updated locally on each client i. The key step involves compression and communication: instead of compressing directly, each client compresses the error-compensated vector . The resulting value is then sent to the server. Furthermore, to compute the server aggregation xk+1, we have the following conclusion.
Proposition 1. For every k≥0, in Equation 8 is equal to .
Proof. Let . Actually, the result of must have blocks equal to some vector z (Mishchenko et al., 2022) such as
Thus, we have the server aggregation
In Algorithm 1, during round k: (1) The clients receive the global model xk from the server (line 5); (2) A subset of clients Sk is sampled following the sampling scheme described in Section 4. The i-th client performs a relaxation step, where λ is the relaxation parameter, computes the proximal local update to obtain the local model , calculates the compressed local model update , and updates the local compression error accumulator and sends the compressed back to the server (line 6–10); (3) The server receives the compressed from clients i∈Sk and performs a global model update using the averaged compressed local model updates (line 13). Particularly, the relaxation strategy, akin to the inertial extrapolation technique (e.g., the heavy ball method), has broadly accelerated iterative algorithms in convex and non-convex optimization, as the cost per iteration stays basically unchanged (He et al., 2021). For any γ>0, serves as an approximation of . The evaluation of proxγfi can be carried out using several established techniques, such as accelerated GD-type algorithms and local SGD (Parikh et al., 2014; Tran-Dinh et al., 2021). It is worth noting that this algorithm requires O(d) memory and incurs O(d) computational overhead per client per round.
4 Theoretical results
For analyzing the convergence of Algorithm 1, we consider several basic assumptions and auxiliary results. Our analysis is based on the analytical framework outlined in Tran-Dinh et al. (2021). First, we introduce a proper sampling scheme following Tran-Dinh et al. (2021). Let p1, …, pn>0 such that for all i∈[N], . Here, is a proper samping scheme of [N], and each Sk is an i.i.d. realization of . Note that . Define as the σ-algebra generated by the sequence S0, …, Sk. This sampling scheme ensures that each client has a significant probability of being updated.
Assumption 1. (L-Smoothness). All local functions fi(·) are L-smooth, if
Assumption 2. (Boundedness from below). F(·) given in (1) is bounded below, that is,
In non-convex FL optimization, Assumptions 1 and 2 are standard. Assumption 2 guarantees that Equation 1 is well-defined and is independent of the choice of algorithms. We first present three useful lemmas that will be instrumental in proving our main theorem.
Lemma 1. Let be generated by Algorithm 1, for all i∈Sk, λ>0, β1>0 and γ>0, we have
Proof.
For the relation , where the approximation error satisfies with a given accuracy , we introduce auxiliary variables and for i∈[n] to analyze the convergence of Algorithm 1,
Here, denotes the vector of errors associated with the approximations of the proximal operator, and serves as an accurate computation to . Note that when , we have and , which implies . From Equation 10 (Atenas, 2025), we have
Then, using the update rule for in Algorithm 1, we get Using Young's inequality , and the L-smoothness of fi, we bound for any β1>0 and as follows
which proves (9).
We then establish the relationship between and the squared norm of the gradient mapping .
Lemma 2. Let be generated by Algorithm 1 and Equation 10, and the gradient mapping be defined by (5). Then, for any λ>0, β2>0, and γ>0, we have
Proof. From the update of , in Algorithm 1 and (11), we have
From the update rule of xk in Algorithm 1, the definition of , the non-expansive property of proxγg, and the fact , we obtain that
By applying the L-smoothness of fi and the Young's inequality stated in Lemma 1, for any β2>0 we deduce that
which proves (12).
Lemma 3. Let be generated by Algorithm 1. Suppose that Assumptions 1 and 2 hold, and we define the Lyapunov function
then by choosing
and for any ε1, β1, β4>0, we have
where
Proof. Given the definition , the update rule in Algorithm 1 (hence ), and the convexity of g, we obtain the following inequality
Combining Equations 10 and 11, we obtain
Next, using the update rules for and in Algorithm 1, we have
In order to establish the descent property of the Lyapunov function Vk+1(xk+1), its second term is expanded and rearranged as follows
Here, Equation 15 is used to separate the term from the approximation error , while Equation 16 expresses in terms of the average vector and the accumulated compression errors and . Then, by combining Equations 14, 17 and using the definition of Vk+1(xk+1), we obtain that
To bound the third term on the right-hand side of Equation 18, we employ the inequality (for any ε1>0) as follows
For , we have . Applying Young's inequality stated in Lemma 1 with any β3>0, we can evaluate the five term on the right-hand side of Equation 18 as follows
To streamline the notation, denote
and substituting Equations 19 and 20 into Equations 18, we obtain an expanded expression for Vk+1. Differentiating between the active client set and the inactive set, and employing the L- smoothness of fi (i.e., ), we have
Next, applying the square-norm expansion
For non-updated clients , the local variable remains unchanged, i.e., . Substituting these relations into the original expression gives
Inserting the reorganized expression into the expansion of Vk+1(xk+1) and collecting common terms gives
Then, from the update rule of in Algorithm 1 together with Equations 10 and 11, we derive an expression for :
where is a composite error term involving the approximation errors , and gradient differences. The subsequent analysis will control the impact of via its norm bound. It is defined as
Its squared norm satisfies
By applying the L-smoothness of fi, the Young's inequality, and Equation 24, we obtain for any β4>0 that
Next, leveraging the L-smoothness of fi and assuming , we demonstrate the boundedness of Vk(xk)
From Lemma 1, we have
According to the sampling scheme, we consider the expectation of with respect to conditioned on . Combined with (26), this yields
where p = minpi∈(0, 1], i∈[n]. By taking the conditional expectation of Equation 25 with respect to conditioned on , and combining it with Equations 10, 21, 27 under the setting β3 = 1, we derive the following
To guarantee the descent property, let
Then, we have
Theorem 1. Let be generated by Algorithm 1. Suppose that Assumptions 1 and 2 hold, for , we have
where
with ε1, β2>0, and π, δ1, δ2 defined in Lemma 3.
Proof. First, it follows from Lemma 3 that
Combining the derived estimates and Lemma 2, we obtain
Taking the total expectation of with respect to , and by using the update of and the definition of the absolute compressor, we obtain the following result
where
is four constants. Summing the inequality over k from 0 to K−1, and then scaling the resultant sum by , we derive
With the initial condition , we obtain . Together with the lower bound E[Vk+1(xk+1)]≥F*, this implies that Equation 30 simplifies to
which proves Equation 28.
Corollary 1. Suppose that Assumptions 1 and 2 hold, EF-Feddr (Algorithm 1) will find a ε-stationary point x such that in the following number of iterations
where M>0 is a constant, and M1, M2, M3, M4 are defined in Theorem 1. Consequently, the communication complexity is .
Proof. As described in Tran-Dinh et al. (2021), the choice of accuracies is constrained such that for a given constant M>0, . Therefore,
Consequently, to guarantee , we have
Therefore, we can take as its lower bound.
5 Experiments
In the experiments, we evaluate EF-Feddr against Eco-FedSplit (Khirirat et al., 2022), Eco-FedProx (Khirirat et al., 2022), and FedDR (Tran-Dinh et al., 2021). In all compression-based baselines, the compression operator C denotes Top-k sparsification. For a fair comparison, we implement Eco-FedSplit, Eco-FedProx, and EF-Feddr on top of the FedDR framework. All experiments are conducted in TensorFlow (Abadi et al., 2016) on a cluster equipped with NVIDIA Tesla P100 (16 GB) GPUs. We next describe the datasets and models used in our study.
5.1 Non-IID datasets
We evaluate on both synthetic and real-world datasets: synthetic-(l, s), FEMNIST, and Shakespeare. Following prior studies (Caldas et al., 2018; Tran-Dinh et al., 2021), we generate synthetic-(l, s) with (l, s) = {(0, 0), (1, 1)}, where l controls the number of differing local models and s controls the degree of local data heterogeneity; larger l and s imply stronger non-IID heterogeneity. FEMNIST extends MNIST to 62 classes with over 800k samples; we use an 80%/20% train/test split and partition by writer, which naturally induces client-level heterogeneity. Shakespeare is a character-level language modeling corpus; we partition by user/play, so each client holds a distinct subset of texts (plays/scenes), yielding non-uniform label distributions across clients. In this context, the degree of non-IID-ness within each client's dataset is quantified by the number of classes present. Specifically, the Shakespeare dataset's non-IID-ness is delineated by the allocation of various plays' texts among clients. Each client is allocated a distinct subset of the corpus, which may include a varying number of plays and scenes. This results in a non-uniform distribution of text, where certain clients predominantly receive data from specific plays, whereas others obtain a more diverse range of content. Analogously, the FEMNIST dataset establishes non-IID-ness through the distribution of handwriting samples across different writers. Each client's dataset comprises samples from a subset of writers, thereby leading to variability in handwriting styles and features among clients. The datasets and model configurations used in our experiments are summarized in Table 2, which outlines their key statistical characteristics.
5.2 Models and hyper-parameters selection
We use a fully connected network with a 60-32-10 architecture and train it for 200 communication rounds with a learning rate of 0.01 on all synthetic datasets. At each round, 10 out of 30 clients are sampled. To evaluate the algorithm's performance with an increased number of clients, we further extended the Synthetic-(1,1) setup from the original 30 clients to 90 clients while preserving the statistical characteristics defined by the (l, s) parameters. The data generation process maintained the same non-IID partition pattern and per-client data distribution profile as the original setup. The client sampling ratio was kept constant at 1/3 (that is, selecting 30 out of 90 clients per round). Eco-FedSplit applies error-compensated compression to FedSplit, and Eco-FedProx does so to FedProx. To study an image classification problem on FEMNIST, we employ artificial neural networks (ANN) consisting of two fully connected layers. The first layer has 128 neurons followed by a ReLU activation function, and the second layer has 62 neurons followed by a softmax activation function for classification. In this experiment, we sample 50 clients out of 200 to perform updates at each communication round for all the above-mentioned algorithms. The model used for FEMNIST is trained for 200 communication rounds in total with an optimal learning rate of 0.003. Consistent with prior research (Li et al., 2020), our approach to character-level prediction in the Shakespeare dataset utilizes a recurrent neural network (RNN) architecture. Specifically, we deploy a two-layer stacked LSTM classifier, each layer comprising 256 hidden units. Each input sequence is structured to include 80 characters, which are initially embedded into an eight-dimensional space prior to LSTM processing. The model subsequently generates a 62-class softmax distribution over the character vocabulary for each training instance. The training regimen involves a total of 50 communication rounds. An optimal learning rate of 0.08 is determined for the four operator-splitting-based federated learning algorithms employed in this study. Parameters for each algorithm such as α∈(0, 2) and η∈[1, 1, 000] for FedDR, μ∈[0.001, 1] for Eco-FedProx, and λ∈(0, 2) and γ∈[1, 1, 000] for EF-Feddr are tuned from a large range of values. For each dataset, we pick the most suitable parameters for each algorithm.
5.3 Comparison of methods
Figures 1–3 report training loss/accuracy and test accuracy vs. communication rounds and communication cost on the synthetic datasets; Figure 4 shows the same on FEMNIST. A key observation is that expanding the total number of clients does not substantially degrade the performance of EF-Feddr. Experimental results under the scaled setting (Figure 3) confirm this: the algorithm maintains nearly identical convergence speed and final accuracy compared to the original 30-client scenario (Figure 2). Across heterogeneous settings, EF-Feddr consistently outperforms the baselines. On FEMNIST, EF-Feddr reaches 80.5% test accuracy at round 50, whereas Eco-FedSplit attains 74.5% only at round 200. Within 200 rounds, EF-Feddr improves accuracy by 12.97% and 7.93% over Eco-FedSplit and Eco-FedProx, respectively. On synthetic-(0, 0), EF-Feddr exceeds the two baselines by 3.88% and 8.40%; on synthetic-(1, 1), by 7.20% and 3.29%. On Shakespeare, Figure 5 shows EF-Feddr also surpasses two Douglas–Rachford splitting-based FL algorithms: Eco-FedSplit and FedDR. As shown in Table 3, EF-Feddr requires 18.64%–85.41% less runtime and 48.03%–93.18% less communication than baseline methods to achieve the same target test accuracy of 60% on synthetic and 70% on FEMNIST. Specifically, on FEMNIST, it meets this target in only 17 communication rounds (8.29 min), significantly outperforming competitors like Eco-FedSplit. These substantial reductions in overhead are consistently observed across the synthetic datasets. Additionally, EF-Feddr achieves a substantial reduction in communication costs without compromising performance relative to the uncompressed FedDR.
Figure 1. Convergence performance of different methods on the synthetic-(0, 0) dataset with Top-k and participation rate p = 0.3.
Figure 2. Convergence performance of different methods on the synthetic-(1, 1) dataset with Top-k, participation rate p = 0.3, and N = 30 total clients.
Figure 3. Convergence performance of different methods on the synthetic-(1, 1) dataset with Top-k, participation rate p = 0.3, and N = 90 total clients.
Figure 4. Convergence performance of different methods on the FEMNIST dataset with Top-k and participation rate p = 0.3.
Figure 5. Convergence performance of different methods on the Shakespeare dataset with Top-k and participation rate p = 0.3.
5.4 Effect of the relaxation parameter
Figure 6 examines the effect of the relaxation parameter λ over 200 iterations. Empirically, the best convergence is observed at λ = 0.3. Consistent with prior findings on FL adaptations of Douglas–Rachford splitting, choosing 0 <λ <1 often leads to faster convergence than the classical (unrelaxed) variant.
6 Discussion
This study presents EF-Feddr, a communication-efficient federated learning algorithm that combines error-compensated compression with Douglas–Rachford splitting. The method's robustness is demonstrated across controlled synthetic and real-world benchmarks, yet we recognize that extreme heterogeneity, such as single-class clients, remains a challenging frontier. Furthermore, while our experiments simulate realistic constraints (partial participation, compression), fully asynchronous updates and dynamic network conditions warrant further study in real deployments.
Recent advances in behavior-based threat hunting (Bhardwaj et al., 2022), IoT firmware security assessment (Bhardwaj et al., 2023), and energy-efficient proactive fault tolerance in cloud environments (Talwar et al., 2021) provide complementary perspectives for building reliable and secure federated systems. While this study focuses on optimization efficiency under non-IID and communication constraints, these studies collectively point toward an integrated “Optimization + System + Security” paradigm for future research. Specifically, they motivate investigations into client behavior profiling for attack detection, trusted execution at the edge, and proactive fault-tolerant scheduling, all of which are essential for deploying robust and efficient federated learning in real-world, dynamic environments. Furthermore, to strengthen the generalizability of our findings, future studies will also include evaluations on a wider variety of datasets, encompassing diverse domains, scales, and heterogeneity patterns, thereby providing a more comprehensive assessment of the algorithm's practical applicability.
7 Conclusion
In this study, we introduced EF-Feddr, a communication-efficient algorithm for non-convex federated learning that leverages the Douglas–Rachford splitting method, error feedback compression, and a relaxation strategy. EF-Feddr improves communication efficiency while preserving solution accuracy. Both theoretical analysis and empirical experiments demonstrated that EF-Feddr substantially reduces the number of bits transmitted from clients to the server compared with uncompressed FedDR. In terms of solution accuracy, EF-Feddr performs comparably to the uncompressed FedDR. Building on the Douglas–Rachford envelope, we established convergence guarantees and analyzed the communication complexity of EF-Feddr under mild assumptions. Extensive experiments further confirmed that our method significantly outperforms existing state-of-the-art approaches in non-IID settings.
Data availability statement
Publicly available datasets were analyzed in this study. This data can be found here: arXiv preprint arXiv:1812.01097.
Author contributions
JX: Validation, Conceptualization, Methodology, Formal analysis, Data curation, Writing – original draft, Software. CW: Visualization, Investigation, Supervision, Resources, Funding acquisition, Project administration, Writing – review & editing.
Funding
The author(s) declared that financial support was not received for this work and/or its publication.
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., et al. (2016). “$TensorFlow$: a system for $Large−Scale$ machine learning,” in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (Savannah, GA), 265–283.
Alistarh, D., Grubic, D., Li, J., Tomioka, R., and Vojnovic, M. (2017). “QSGD: communication-efficient SGD via gradient quantization and encoding,” in Advances in Neural Information Processing Systems 30.
Atenas, F. (2025). Understanding the Douglas-Rachford splitting method through the lenses of moreau-type envelopes. Comput. Optim. Appl. 90, 881–910. doi: 10.1007/s10589-024-00646-9
Bao, H., Chen, P., Sun, Y., and Li, Z. (2025). EFSKIP: a new error feedback with linear speedup for compressed federated learning with arbitrary data heterogeneity. Proc. AAAI Conf. Artif. Intell. 39, 15489–15497. doi: 10.1609/aaai.v39i15.33700
Bernstein, J., Wang, Y.-X., Azizzadenesheli, K., and Anandkumar, A. (2018). “SIGNSGD: compressed optimisation for non-convex problems,” in International Conference on Machine Learning (Stockholm: PMLR), 560–569.
Bhardwaj, A., Kaushik, K., Alomari, A., Alsirhani, A., Alshahrani, M. M., Bharany, S., et al. (2022). BTH: behavior-based structured threat hunting framework to analyze and detect advanced adversaries. Electronics 11:2992. doi: 10.3390/electronics11192992
Bhardwaj, A., Kaushik, K., Bharany, S., and Kim, S. (2023). Forensic analysis and security assessment of IOT camera firmware for smart homes. Egypt. Inf. J. 24:100409. doi: 10.1016/j.eij.2023.100409
Caldas, S., Duddu, S. M. K., Wu, P., Li, T., Konečnỳ, J., McMahan, H. B., et al. (2018). Leaf: a benchmark for federated settings. arXiv [preprint]. arXiv:1812.01097. doi: 10.4885/arXiv.1812.01097
Ezequiel, C. E. J., Gjoreski, M., and Langheinrich, M. (2022). Federated learning for privacy-aware human mobility modeling. Front. Artif. Intell. 5:867046. doi: 10.3389/frai.2022.867046
Godavarthi, D., Jaswanth, V., Mohanty, S., Dinesh, P., Venkata Charan Sathvik, R., Moreira, F., et al. (2025). Federated quantum-inspired anomaly detection using collaborative neural clients. Front. Artif Intell. 8:1648609. doi: 10.3389/frai.2025.1648609
Goel, C., Anita, X., and Anbarasi, J. L. (2025). Federated knee injury diagnosis using few shot learning. Front. Artif. Intell. 8:1589358. doi: 10.3389/frai.2025.1589358
He, S., Dong, Q.-L., Tian, H., and Li, X.-H. (2021). On the optimal relaxation parameters of Krasnosel'ski-Mann iteration. Optimization 70, 1959–1986. doi: 10.1080/02331934.2020.1767101
Islam, F., Mahmood, A., Mukhtiar, N., Wijethilake, K. E., and Sheng, Q. Z. (2024). “Fairequityfl-a fair and equitable client selection in federated learning for heterogeneous IOV networks,” in International Conference on Advanced Data Mining and Applications (Cham: Springer), 254–269. doi: 10.1007/978-981-96-0814-0_17
Jhunjhunwala, D., Sharma, P., Nagarkatti, A., and Joshi, G. (2022). “Fedvarp: tackling the variance due to partial client participation in federated learning,” in Uncertainty in Artificial Intelligence (Eindhoven: PMLR), 906–916.
Kant, S., da Silva, J. M. B., Fodor, G., Göransson, B., Bengtsson, M., and Fischione, C. (2022). Federated learning using three-operator ADMM. IEEE J. Sel. Topics Signal Processing 17, 205–221. doi: 10.1109/JSTSP.2022.3221681
Karimireddy, S. P., Rebjock, Q., Stich, S., and Jaggi, M. (2019). “Error feedback fixes signsgd and other gradient compression schemes,” in International Conference on Machine Learning (Long Beach, CA: PMLR), 3252–3261.
Khirirat, S., Johansson, M., and Alistarh, D. (2018). “Gradient compression for communication-limited convex optimization,” in 2018 IEEE Conference on Decision and Control (CDC) (Miami, FL: IEEE), 166–171. doi: 10.1109/CDC.2018.8619625
Khirirat, S., Magnússon, S., and Johansson, M. (2022). “Eco-fedsplit: federated learning with error-compensated compression,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Singapore: IEEE), 5952–5956. doi: 10.1109/ICASSP43922.2022.9747809
Konecný, J., McMahan, H. B., Yu, F. X., Richtárik, P., Suresh, A. T., and Bacon, D. (2016). Federated learning: strategies for improving communication efficiency. arXiv [preprint]. arXiv:1610.05492. doi: 10.48550/arXiv.1610.05492
Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V., et al. (2020). Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2, 429–450. doi: 10.48550/arXiv.1812.06127
Li, X., and Li, P. (2023). “Analysis of error feedback in federated non-convex optimization with biased compression: fast convergence and partial participation,” in International Conference on Machine Learning (Honolulu, HI: PMLR), 19638–19688.
Liu, J., Xu, L., Shen, S., and Ling, Q. (2019). An accelerated variance reducing stochastic method with Douglas-Rachford splitting. Mach. Learn. 108, 859–878. doi: 10.1007/s10994-019-05785-3
Liu, Y., Zhou, Y., and Lin, R. (2024). The proximal operator of the piece-wise exponential function. IEEE Signal Process. Lett. 31, 894–898. doi: 10.1109/LSP.2024.3370493
Long, Z., Chen, Y., Dou, H., Zhang, Y., and Chen, Y. (2024). Fedsq: sparse-quantized federated learning for communication efficiency. IEEE Trans. Consum. Electron. 70, 4050–4061. doi: 10.1109/TCE.2024.3352432
Malekmohammadi, S., Shaloudegi, K., Hu, Z., and Yu, Y. (2021). An operator splitting view of federated learning. arXiv [preprint]. arXiv:2108.05974. doi: 10.48550/arXiv.2108.05974
McMahan, B., Moore, E., Ramage, D., Hampson, S., and Arcas, B. A. (2017). “Communication-efficient learning of deep networks from decentralized data,” in Artificial Intelligence and Statistics (Fort Lauderdale, FL: PMLR), 1273–1282.
Mishchenko, K., Khaled, A., and Richtárik, P. (2022). “Proximal and federated random reshuffling,” in International Conference on Machine Learning (Baltimore, MA: PMLR), 15718–15749.
Parikh, N.Boyd, S., et al. (2014). Proximal algorithms. Found. Trends Optim. 1, 127–239. doi: 10.1561/2400000003
Pathak, R., and Wainwright, M. J. (2020). Fedsplit: an algorithmic framework for fast federated optimization. Adv. Neural Inf. Process. Syst. 33, 7057–7066. doi: 10.48550/arXiv.2005.05238
Reisizadeh, A., Mokhtari, A., Hassani, H., Jadbabaie, A., and Pedarsani, R. (2020). “FEDPAQ: a communication-efficient federated learning method with periodic averaging and quantization,” in International Conference on Artificial Intelligence and Statistics (PMLR), 2021–2031.
Richtárik, P., Sokolov, I., and Fatkhullin, I. (2021). Ef21: a new, simpler, theoretically better, and practically faster error feedback. Adv. Neural Inf. Process. Syst. 34, 4384–4396. doi: 10.48550/arXiv.2106.05203
Sahu, A., Dutta, A., Abdelmoniem, M., Banerjee, A., Canini, T., Kalnis, M., et al. (2021). Rethinking gradient sparsification as total error minimization. Adv. Neural Inf. Process. Syst. 34, 8133–8146. doi: 10.48550/arXiv.2108.00951
Saifullah, S., Mercier, D., Lucieri, A., Dengel, A., and Ahmed, S. (2024). The privacy-explainability trade-off: unraveling the impacts of differential privacy and federated learning on attribution methods. Front. Artif. Intell. 7:1236947. doi: 10.3389/frai.2024.1236947
Seide, F., Fu, H., Droppo, J., Li, G., and Yu, D. (2014). “1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs,” in Interspeech, Vol. 2014 (Singapore), 1058–1062. doi: 10.21437/Interspeech.2014-274
Sun, W., Wang, A., Gao, Z., and Zhou, Y. (2024). “A communication-concerned federated learning framework based on clustering selection,” in International Conference on Advanced Data Mining and Applications (Cham: Springer), 285–300. doi: 10.1007/978-981-96-0814-0_19
Talwar, B., Arora, A., and Bharany, S. (2021). “An energy efficient agent aware proactive fault tolerance for preventing deterioration of virtual machines within cloud environment,” in 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) (Noida: IEEE), 1–7. doi: 10.1109/ICRITO51393.2021.9596453
Tang, Z., Wang, Y., and Chang, T.-H. (2024). z-signfedavg: a unified stochastic sign-based compression for federated learning. Proc. AAAI Conf. Artif. Intell. 38, 15301–15309. doi: 10.1609/aaai.v38i14.29454
Tran-Dinh, Q., Pham, N. H., Phan, D. T., and Nguyen, L. M. (2021). Feddr-randomized Douglas-Rachford splitting algorithms for nonconvex federated composite optimization. Adv. Neural Inf. Process. Syst. 34, 30326–30338. doi: 10.48550/arXiv.2103.0345
Valdeira, P., Xavier, J., Soares, C., and Chi, Y. (2025). Communication-efficient vertical federated learning via compressed error feedback. IEEE Trans. Signal Process. 73, 1065–1080. doi: 10.1109/TSP.2025.3540655
Wang, H., Marella, S., and Anderson, J. (2022). “FEDADMM: a federated primal-dual algorithm allowing partial participation,” in 2022 IEEE 61st Conference on Decision and Control (CDC) (Cancún: IEEE), 287–294. doi: 10.1109/CDC51059.2022.9992745
Keywords: communication efficiency, composite optimization, data heterogeneity, error feedback, federated learning, operator splitting
Citation: Xue J and Wang C (2026) EF-Feddr: communication-efficient federated learning with Douglas–Rachford splitting and error feedback. Front. Artif. Intell. 9:1699896. doi: 10.3389/frai.2026.1699896
Received: 05 September 2025; Accepted: 05 January 2026;
Published: 28 January 2026.
Edited by:
Haifeng Chen, NEC Laboratories America Inc, United StatesCopyright © 2026 Xue and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Chundong Wang, bWljaGFlbDM3NjlAMTYzLmNvbQ==
Chundong Wang1,2*