EF-Feddr: communication-efficient federated learning with Douglas–Rachford splitting and error feedback

Xue, Jiao; Wang, Chundong

doi:10.3389/frai.2026.1699896

ORIGINAL RESEARCH article

Front. Artif. Intell., 28 January 2026

Sec. Machine Learning and Artificial Intelligence

Volume 9 - 2026 | https://doi.org/10.3389/frai.2026.1699896

This article is part of the Research TopicEthical Artificial Intelligence: Methods and ApplicationsView all 4 articles

EF-Feddr: communication-efficient federated learning with Douglas–Rachford splitting and error feedback

Jiao Xue¹

Chundong Wang^1,2^*

¹School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, China
²Tianjin Police Institute, Tianjin, China

Introduction: Federated learning (FL) is a distributed machine learning paradigm that preserves data privacy and mitigates data silos. Nevertheless, frequent communication between clients and the server often becomes a major bottleneck, restricting training efficiency and scalability.

Methods: To address this challenge, we propose a novel communication-efficient algorithm, EF-Feddr, for federated composite optimization, where the objective function includes a potentially non-smooth regularization term and local datasets are non-IID. Our method is built upon the relaxed Douglas–Rachford splitting method and incorporates error feedback (EF)—a widely adopted compression framework—to ensure convergence when biased compression (e.g., top-k sparsification) is applied.

Results: Under the partial client participation setting, our theoretical analysis demonstrates that EF-Feddr achieves a fast convergence rate of O(1/K) and a communication complexity of O(1/ε²). Comprehensive experiments conducted on the FEMNIST and Shakespeare benchmarks, as well as controlled synthetic data, consistently validate the efficacy of EF-Feddr across diverse scenarios.

Discussion: The results confirm that the integration of error feedback with the relaxed Douglas–Rachford splitting method in EF-Feddr effectively overcomes the convergence degradation typically caused by biased compression, thereby offering a practical and efficient solution for communication-constrained federated learning.

1 Introduction

Federated learning (FL) (Konecný et al., 2016; McMahan et al., 2017) is a distributed framework designed to address large-scale learning problems across networks of edge clients. In this paradigm, clients update models locally on their private data, while the server aggregates these updates to refine a shared global model. This collaborative process enables the development of global or personalized models without compromising user privacy (Ezequiel et al., 2022; Saifullah et al., 2024). Despite these advantages, communication between clients and the server remains a critical bottleneck, particularly when the number of participating clients is large, bandwidth is constrained, and the models involve high-dimensional parameters (Bhardwaj et al., 2023; Talwar et al., 2021). Recent efforts to improve the communication efficiency of FL have primarily focused on two directions: (i) reducing the number of communication rounds through partial client participation or increased local computation, and (ii) lowering the number of transmitted bits per round via techniques such as quantization and residual gradient compression. While these strategies effectively cut communication costs, they also introduce additional variance, which may widen the neighborhood around the optimal solution and, in some cases, prevent convergence under biased compression. To mitigate these issues, variance-reduction techniques such as error feedback (EF) are commonly employed. In contrast to traditional distributed training, it is unrealistic to assume that data on each local device are always independent and identically distributed (IID). Prior studies have consistently shown that FL accuracy degrades significantly when faced with non-IID or heterogeneous data (Islam et al., 2024). In this study, we focus on the following federated composite optimization (FCO) problem:

\begin{array}{l} min_{x \in ℝ^{d}} F (x) = f (x) + g (x) = \frac{1}{n} \sum_{i = 1}^{n} f_{i} (x) + g (x), & (1) \end{array}

where n denotes the number of clients, f_i is the local loss function for the i-th client, which is L-smooth and non-convex, and g represents the regularization term, which is proper, closed, convex (possibly non-smooth). As a practical example, consider a collaborative environmental monitoring project in which multiple research institutions aim to analyze sensor data from diverse geographical locations to detect climate change patterns. Due to privacy concerns and proprietary restrictions, however, raw data cannot be shared directly. In this case, enforcing sparse regularization becomes particularly important: although the dataset may contain relatively few observations (e.g., readings from a sparse sensor network Bhardwaj et al., 2022), each observation typically involves a high-dimensional set of features such as temperature, humidity, wind speed, and pollution levels, a combination of factors that further justifies the use of sparse regularization to identify salient features and prevent overfitting.

Operator splitting constitutes a broad class of methods for solving optimization problems of the form (Equation 1). These methods decompose numerically intractable components into simpler subproblems, thereby reducing computational complexity, enhancing efficiency, and enabling modular algorithms that are naturally suited for parallelization. Operator splitting has been successfully applied to a wide range of challenging optimization problems. Among these, the Douglas–Rachford splitting method is particularly well-established due to its enhanced iterative stability and accelerated convergence rate. Furthermore, its update rule decomposes the global composite objective into local proximal steps that can be executed in a fully parallel manner. This structure inherently aligns with the distributed nature of federated learning, facilitating efficient client-side computation while also underpinning the method's enhanced iterative stability. From this perspective, many state-of-the-art FL algorithms can be interpreted within the operator splitting framework (Malekmohammadi et al., 2021). Examples include FedAvg (McMahan et al., 2017), FedProx (Li et al., 2020), FedSplit (Pathak and Wainwright, 2020), and FedDR (Tran-Dinh et al., 2021). However, for the FCO Equation 1, existing FL methods such as FedAvg and its communication-efficient variants are primarily designed for smooth, unconstrained settings ${min}_{x \in ℝ^{d}} F (x) = \frac{1}{n} \sum_{i = 1}^{n} f_{i} (x)$ . In non-smooth FL settings, subgradient methods are widely used but suffer from slow convergence (Jhunjhunwala et al., 2022). Although proximal operators offer a more effective alternative with superior convergence properties (Liu et al., 2024), their seamless integration into communication-efficient FL frameworks remains limited. Moreover, while compression techniques effectively reduce communication overhead, they introduce additional variance that can enlarge the solution neighborhood and hinder convergence. Critically, existing communication-efficient methods have predominantly been designed for smooth FL problems, leaving a pronounced combined gap in addressing non-smooth federated composite optimization under compression-induced variance and communication constraints simultaneously. To bridge this multifaceted gap, this study presents EF-Feddr, a communication-efficient FL algorithm that employs the Top-k sparsification technique to compress transmitted parameters and reduce communication bits, incorporates an error feedback (Li and Li, 2023) mechanism to mitigate variance introduced by compression, and further integrates the relaxed Douglas–Rachford splitting method (He et al., 2021) along with a proximal operator to accelerate the iterative process while effectively handling the non-smoothness of the global regularization term. This integrated design enables EF-Feddr to be applicable to a wider range of scenarios and constrained settings. Leveraging the Douglas–Rachford envelope, we establish convergence guarantees for EF-Feddr in non-convex FL problems under mild assumptions.

Our contributions are summarized as follows:

• We propose EF-Feddr, an algorithm that combines the relaxed Douglas–Rachford splitting method with error feedback to reduce communication costs between clients and the server without sacrificing accuracy in non-IID settings. In addition, the error feedback mechanism enhances the stability of communication-compressed training in FL.

• We establish theoretical convergence guarantees for EF-Feddr based on the Douglas–Rachford envelope. Specifically, our method achieves a convergence rate of $O (\frac{1}{K})$ and a communication complexity of $O (\frac{1}{ε^{2}})$ for non-convex loss functions under partial client participation.

• Through experiments on synthetic datasets, the FEMNIST dataset, and the Shakespeare dataset, we show that EF-Feddr improves accuracy by 3.29%–12.97% over state-of-the-art FL variants, while significantly reducing communication costs compared to uncompressed FedDR.

2 Related work

2.1 Operator splitting methods

Classical operator splitting methods such as Douglas–Rachford (DR), Forward-Backward (FB), and the Alternating Direction Method of Multipliers (ADMM) have recently been adopted in FL (Godavarthi et al., 2025; Goel et al., 2025). FedAvg (McMahan et al., 2017) can be viewed as an instance of k-step FB splitting, while FedProx (Li et al., 2020) extends the backward-backward splitting method. It is another FB variant tailored for regularized FL problems. FedSplit (Pathak and Wainwright, 2020), based on Peaceman-Rachford splitting, aims to identify the correct fixed point for strictly convex FL problems. Its communication-efficient variant, Eco-FedSplit (Khirirat et al., 2022), incorporates error-compensated compression. For the FCO problem, FedDR (Tran-Dinh et al., 2021) integrates a randomized block-coordinate strategy with DR splitting to solve non-convex formulations. FedADMM (Wang et al., 2022) leverages ADMM by applying FedDR to the dual form of the FCO problem, while FedTOP-ADMM (Kant et al., 2022) generalizes FedADMM as the first three-operator method used in FL.

2.2 Communication-efficient FL

To address the communication bottleneck in FL (Sun et al., 2024), two categories of compression methods have been widely explored: unbiased compressors (e.g., stochastic quantization Alistarh et al., 2017) and biased compressors (e.g., top-k sparsification Khirirat et al., 2018). FedPAQ (Reisizadeh et al., 2020) reduces communication costs through periodic averaging, partial client participation, and quantization. However, this reduction comes at the expense of convergence accuracy, which requires additional training iterations. The authors also analyzed the trade-off between communication overhead and convergence in their experiments. The z-SignFedAvg algorithm (Tang et al., 2024), a variant of FedAvg, employs stochastic sign-based compression. It achieves accuracy comparable to uncompressed FedAvg while greatly reducing communication overhead. Building on the lazily aggregated gradient rule and error feedback, (Zhou et al., 2023) proposed two communication-efficient algorithms for non-convex FL: EF-LAG and BiEF-LAG, which adapt both uplink and downlink communications. Similarly, FedSQ (Long et al., 2024) introduces a hybrid approach combining sparsity and quantization to reduce communication costs while enhancing convergence.

2.3 Error feedback

In the realm of distributed optimization, it has been noted that employing biased compressors for direct updates may decelerate convergence, deteriorate generalization performance, or even induce divergence (Li and Li, 2023). To counteract these issues, error feedback techniques have been introduced, which can reduce the compression error compared to direct compression. The study (Seide et al., 2014) first proposed this method as a heuristic approach, which is inspired by the idea of Sigma-Delta modulation. EF21 (Richtárik et al., 2021) removes strict assumptions such as bounded gradients and bounded dissimilarity, and can handle arbitrary data heterogeneity among clients, but leads to worse computational complexity. EFSkip (Bao et al., 2025) allows arbitrary data heterogeneity and enjoys linear speedup for significantly improving upon previous results.

3 Compressed non-convex FL with error feedback

In this section, we present EF-Feddr, an algorithm that integrates error feedback into the relaxed Douglas–Rachford splitting framework to address the non-convex FCO problem. We begin with a brief introduction to the Douglas–Rachford splitting method, followed by an explanation of how error feedback is incorporated to improve communication efficiency. We then provide the detailed formulation of EF-Feddr and analyze its convergence properties. Main notations are listed in Table 1.

Table 1

Table 1. Summary of main notations.

3.1 Problem formulation

The FCO Equation 1 is mathematically equivalent to the consensus optimization problem

\begin{array}{l} \begin{array}{l} min_{x_{1}, \dots, x_{n}} F (x) = f (x) + g (x) = \frac{1}{n} \sum_{i = 1}^{n} f_{i} (x_{i}) + g (x) \\ subject to x_{1} = x_{2} = \dots = x_{n}, \end{array} & (2) \end{array}

where the consensus constraint set is E = {x = (x₁, …, x_n)|x₁ = x₂ = ⋯ = x_n.}. Let l_E be the indicator function of E. With the indicator function, one can treat the constrained problem as unconstrained by moving the constraints into the objective function. Then Equation 1 is obviously equivalent to

\begin{array}{l} min \frac{1}{n} \sum_{i = 1}^{n} f_{i} (x_{i}) + g (x) + l_{𝔼} (x) . & (3) \end{array}

The first-order optimality condition is given by 0∈∇f(x)+∂g(x)+∂l_E(x), where ∇f(x) = [∇f₁(x₁), ..., ∇f_n(x_n)]. A point x^* is a stationary point to Equation 1, if $0 \in \nabla f (x^{*}) + \partial g (x^{*}) + \partial l_{E} (x^{*})$ . Additionally, the operator splitting method encompasses a broad range of techniques to effectively address this Equation 3. A key advantage of operator splitting methods is their efficient per-iteration operations, which makes them particularly suitable for large-scale applications due to their lower computational costs (He et al., 2021), among which the DR splitting method is particularly well-known. The iteration equations for the DR splitting method are given by

\begin{array}{l} {\begin{cases} y^{k + 1} = y^{k} + x^{k} - z^{k + 1} \\ z^{k + 1} = {prox}_{γ f} (y^{k}) \\ x^{k + 1} = {prox}_{γ (g + l_{E})} (2 z^{k + 1} - y^{k + 1}) . \end{cases} & (4) \end{array}

Given that the DR splitting method often demonstrates favorable and stable convergence behavior in practice, we base our approach on its relaxed variant to solve Equation 1. The detailed application is presented in Section 3.3.

For convenience, we introduce the definitions of the key concepts that will be utilized. For a function f, the proximal operator at point x with a step size γ>0 is

\begin{array}{l} {prox}_{γ f} (x) = arg min_{y} {f (y) + \frac{1}{2 γ} | | y - x | |^{2}}, \end{array}

the Moreau envelope of f with a step size γ>0 is

\begin{array}{l} M_{γ f} (x) = min_{y} {f (y) + \frac{1}{2 γ} | | y - x | |^{2}}, \end{array}

the gradient mapping of f at point x with a step size γ>0 is

\begin{array}{l} G_{γ f} (x) = \frac{1}{γ} (x - {prox}_{γ f} (x)) . \end{array}

We observe that ∇M_γf(x) = G_γf(x) (Liu et al., 2019). Moreover, the proximal operator update $z^{k} = {prox}_{γ f} (y^{k})$ can be written as

\begin{array}{l} z^{k} = y^{k} - γ G_{γ f} (y^{k}) . \end{array}

This representation reveals that the proximal operator update is analogous to taking a gradient step applied to the gradient mapping $G_{γ f} (y^{k})$ of f. For the composite function F(x) = f(x)+g(x), the corresponding gradient mapping is given by

\begin{array}{l} G_{γ} (x) = \frac{1}{γ} (x - {prox}_{γ g} (x - γ \nabla f (x))) . & (5) \end{array}

In the context of general non-convex non-smooth problems, the gradient mapping $G_{γ} (x)$ is commonly used to assess convergence (Liu et al., 2024). Specifically, $0 \in \nabla f (x^{*}) + \partial g (x^{*}) + \partial l_{E} (x^{*})$ of Equation 1 is equivalent to $G_{γ} (x^{*}) = 0$ .

3.2 Error feedback

We now define a general class of compressors that will be used throughout this study

Definition 1. (Absolute compressor). A map C:ℝ^d → ℝ^d is an absolute compressor operator if there exists ν>0 such that, ∀x∈ℝ^d, E||x−C(x)||² ≤ ν².

Most popular compressors such as the sign compression (Bernstein et al., 2018), the Top-k sparsification (Khirirat et al., 2018) and the sparsification together with quantization (Alistarh et al., 2017) are in fact absolute compressors if the full-precision vector has a bounded norm (Khirirat et al., 2022; Sahu et al., 2021).

Error feedback (also known as error compensation) is a popular tool in FL to reduce compression error and improve convergence speed compared to direct compression (Valdeira et al., 2025). Its mechanism shares a fundamental principle with Sigma-Delta modulation in signal processing (Seide et al., 2014). Technically, when transmitting a sequence of vectors, the method incorporates an auxiliary vector that accumulates the compression error at each step. This accumulated error is then added to the current vector before it undergoes compression and transmission (Karimireddy et al., 2019). More specifically, based on the DR splitting method (Equation 4), the update steps of the direct compression scheme are as follows:

\begin{array}{l} \begin{array}{l} c^{k + 1} = C (2 z^{k + 1} - y^{k + 1}), (direct compression) \\ x^{k + 1} = {prox}_{γ (g + l_{E})} (c^{k + 1}), (model update) \end{array} & (6) \end{array}

the update steps with error feedback compression are as follows:

\begin{array}{l} \begin{array}{l} c^{k + 1} = C (2 z^{k + 1} - y^{k + 1} + e^{k}), (error compensation) \\ e^{k + 1} = 2 z^{k + 1} - y^{k + 1} + e^{k} - c^{k + 1}, (compute the error) \\ x^{k + 1} = {prox}_{γ (g + l_{E})} (c^{k + 1}) . (model update) \end{array} & (7) \end{array}

In direct compression, each vector 2z^k+1−y^k+1 is individually compressed, and the receiver directly uses its compressed version C(2z^k+1−y^k+1) in place of the original. Conversely, error feedback compression employs a proxy vector c^k+1 for 2z^k+1−y^k+1 that integrates information from prior steps 0, 1, …, k. This proxy is refined via an auxiliary vector e^k+1, which is iteratively updated and stored to accumulate the compression error at each step.

3.3 EF-Feddr algorithm

In this section, we propose the following EF-Feddr algorithm. The details of EF-Feddr are presented in Algorithm 1. Specifically, applying the relaxed DR splitting method (He et al., 2021) to the Equation 3 of Equation 1 in a distributed setting yields the following iterative steps:

{\begin{array}{l} y_{i}^{k + 1} = y_{i}^{k} + λ (x^{k} - z_{i}^{k}) \\ z_{i}^{k + 1} = {prox}_{γ f_{i}} (y_{i}^{k + 1}) \\ x_{i}^{k + 1} = 2 z_{i}^{k + 1} - y_{i}^{k + 1} \\ x^{k + 1} = {prox}_{n γ (g + l_{E})} (x_{i}^{k + 1}) . \end{array}

Algorithm 1

Algorithm 1. EF-Feddr.

By integrating the error feedback mechanism detailed in Section 3.2, we obtain the EF-Feddr iterative scheme:

\begin{array}{l} {\begin{cases} y_{i}^{k + 1} = y_{i}^{k} + λ (x^{k} - z_{i}^{k}) \\ z_{i}^{k + 1} \approx {prox}_{γ f_{i}} (y_{i}^{k + 1}) \\ x_{i}^{k + 1} = C (2 z_{i}^{k + 1} - y_{i}^{k + 1} + e_{i}^{k}) \\ e_{i}^{k + 1} = 2 z_{i}^{k + 1} - y_{i}^{k + 1} + e_{i}^{k} - x_{i}^{k + 1} \\ x^{k + 1} = {prox}_{n γ (g + l_{E})} (x_{i}^{k + 1}), \end{cases} & (8) \end{array}

where λ∈(0, 2) (He et al., 2021) is the relaxation parameter. The variables $y_{i}^{k + 1}$ , $z_{i}^{k + 1}$ , $x_{i}^{k + 1}$ and $e_{i}^{k + 1}$ are updated locally on each client i. The key step involves compression and communication: instead of compressing $2 z_{i}^{k + 1} - y_{i}^{k + 1}$ directly, each client compresses the error-compensated vector $2 z_{i}^{k + 1} - y_{i}^{k + 1} + e_{i}^{k}$ . The resulting value $x_{i}^{k + 1}$ is then sent to the server. Furthermore, to compute the server aggregation x^k+1, we have the following conclusion.

Proposition 1. For every k≥0, $x^{k + 1} = {prox}_{n γ (g + l_{E})} (x_{i}^{k + 1})$ in Equation 8 is equal to ${prox}_{γ g} (\frac{1}{n} \sum_{i \in S_{k}} x_{i}^{k + 1})$ .

Proof. Let $\bar{x} = \frac{1}{n} \sum_{i \in S_{k}} x_{i}^{k + 1}$ . Actually, the result of ${prox}_{n γ (g + l_{E})} (x_{i}^{k + 1})$ must have blocks equal to some vector z (Mishchenko et al., 2022) such as

\begin{array}{l} z = \arg \min_{y} {g (y) + \frac{1}{2 n γ} \sum_{i = 1}^{n} ‖ y - x_{i}^{k + 1} ‖^{2}} \\ = \arg \min_{y} {g (y) + \frac{1}{2 n γ} \sum_{i = 1}^{n} (‖ y - \bar{x} ‖^{2} + 2 〈 y - \bar{x}, \bar{x} - x_{i}^{k + 1} 〉 \\ + ‖ \bar{x} - x_{i}^{k + 1} ‖^{2})} \\ = \arg \min_{y} {g (y) + \frac{1}{2 n γ} [\sum_{i = 1}^{n} ‖ y - \bar{x} ‖^{2} + 2 〈 y - \bar{x}, n \bar{x} 〉 \\ - 2 〈 y - \bar{x}, n \bar{x} 〉]} \\ = \arg \min_{y} {g (y) + \frac{1}{2 γ} ‖ y - \bar{x} ‖^{2}} \\ = {prox}_{γ g} (\bar{x}) = {prox}_{γ g} (\frac{1}{n} \sum_{i \in S_{k}} x_{i}^{k + 1}) . \end{array}

Thus, we have the server aggregation

\begin{array}{l} x^{k + 1} = {prox}_{n γ (g + l_{E})} (x_{i}^{k + 1}) = {prox}_{γ g} (\frac{1}{n} \sum_{i \in S_{k}} x_{i}^{k + 1}) . \end{array}

In Algorithm 1, during round k: (1) The clients receive the global model x^k from the server (line 5); (2) A subset of clients S_k is sampled following the sampling scheme described in Section 4. The i-th client performs a relaxation step, where λ is the relaxation parameter, computes the proximal local update to obtain the local model $z_{i}^{k + 1}$ , calculates the compressed local model update $x_{i}^{k + 1}$ , and updates the local compression error accumulator $e_{i}^{k + 1}$ and sends the compressed $x_{i}^{k + 1}$ back to the server (line 6–10); (3) The server receives the compressed $x_{i}^{k + 1}$ from clients i∈S_k and performs a global model update using the averaged compressed local model updates (line 13). Particularly, the relaxation strategy, akin to the inertial extrapolation technique (e.g., the heavy ball method), has broadly accelerated iterative algorithms in convex and non-convex optimization, as the cost per iteration stays basically unchanged (He et al., 2021). For any γ>0, $z_{i}^{k + 1}$ serves as an approximation of ${prox}_{γ f_{i}} (y_{i}^{k + 1})$ . The evaluation of prox_{γ_f_i} can be carried out using several established techniques, such as accelerated GD-type algorithms and local SGD (Parikh et al., 2014; Tran-Dinh et al., 2021). It is worth noting that this algorithm requires O(d) memory and incurs O(d) computational overhead per client per round.

4 Theoretical results

For analyzing the convergence of Algorithm 1, we consider several basic assumptions and auxiliary results. Our analysis is based on the analytical framework outlined in Tran-Dinh et al. (2021). First, we introduce a proper sampling scheme following Tran-Dinh et al. (2021). Let p₁, …, p_n>0 such that for all i∈[N], $ℙ (i \in \bar{S}) = p_{i} \leq 1$ . Here, $\bar{S}$ is a proper samping scheme of [N], and each S_k is an i.i.d. realization of $\bar{S}$ . Note that $p_{i} = \sum_{S \subseteq [N], i \in S} ℙ (\bar{S} = S)$ . Define $A_{k} = σ (S_{0}, \dots, S_{k})$ as the σ-algebra generated by the sequence S₀, …, S_k. This sampling scheme ensures that each client has a significant probability of being updated.

Assumption 1. (L-Smoothness). All local functions f_i(·) are L-smooth, if

\begin{array}{l} \forall x, y, || \nabla f_{i} (x) -\nabla f_{i} (y) || \leq L || x - y ||. \end{array}

Assumption 2. (Boundedness from below). F(·) given in (1) is bounded below, that is, $F^{*} = inf_{x \in ℝ^{d}} F (x) > - \infty .$

In non-convex FL optimization, Assumptions 1 and 2 are standard. Assumption 2 guarantees that Equation 1 is well-defined and is independent of the choice of algorithms. We first present three useful lemmas that will be instrumental in proving our main theorem.

Lemma 1. Let ${(y_{i}^{k}, z_{i}^{k}, x_{i}^{k}, e_{i}^{k}, x^{k})}$ be generated by Algorithm 1, for all i∈S_k, λ>0, β₁>0 and γ>0, we have

\begin{array}{l} || x^{k} - z_{i}^{k} | |^{2} \leq \frac{2 (γ^{2} L^{2} + 1)}{λ^{2}} [(1 + β_{1}) || z_{i}^{k + 1} - z_{i}^{k} | |^{2} \\ + 2 (1 + \frac{1}{β_{1}}) (|| m_{i}^{k + 1} | |^{2} + || m_{i}^{k} | |^{2})] . & (9) \end{array}

Proof.

For the relation $z_{i}^{k + 1} \approx {prox}_{γ f_{i}} (y_{i}^{k + 1})$ , where the approximation error satisfies $| | z_{i}^{k + 1} - {prox}_{γ f_{i}} (y_{i}^{k + 1}) | | \leq ε_{i}^{k}$ with a given accuracy $ε_{i}^{k} \geq 0$ , we introduce auxiliary variables $w_{i}^{0}$ and $w_{i}^{k + 1}$ for i∈[n] to analyze the convergence of Algorithm 1,

\begin{array}{l} w_{i}^{0} = {prox}_{γ f_{i}} (y_{i}^{0}), \\ w_{i}^{k + 1} = {\begin{array}{l} {prox}_{γ f_{i}} (y_{i}^{k + 1}) & if i \in S_{k} \\ w_{i}^{k} & if i \notin S_{k} \end{array}, \\ z_{i}^{k} = w_{i}^{k} + m_{i}^{k}, where ‖ m_{i}^{k} ‖ \leq ε_{i}^{k} . & (10) \end{array}

Here, $m_{i}^{k}$ denotes the vector of errors associated with the approximations of the proximal operator, and $w_{i}^{k + 1}$ serves as an accurate computation to ${prox}_{γ f_{i}} (y_{i}^{k + 1})$ . Note that when $i \notin S_{k}$ , we have $z_{i}^{k + 1} = z_{i}^{k}$ and $w_{i}^{k + 1} = w_{i}^{k}$ , which implies $| | m_{i}^{k + 1} | | = | | z_{i}^{k + 1} - w_{i}^{k + 1} | | = | | m_{i}^{k} | | = | | z_{i}^{k} - w_{i}^{k} | |$ . From Equation 10 (Atenas, 2025), we have

\begin{array}{l} y_{i}^{k} = w_{i}^{k} + γ \nabla f_{i} (w_{i}^{k}) . & (11) \end{array}

Then, using the update rule for $y_{i}^{k + 1}$ in Algorithm 1, we get $x^{k} - z_{i}^{k} = \frac{1}{λ} (y_{i}^{k + 1} - y_{i}^{k}) = \frac{1}{λ} (w_{i}^{k + 1} - w_{i}^{k}) + \frac{γ}{λ} (\nabla f_{i} (w_{i}^{k + 1}) - \nabla f_{i} (w_{i}^{k})) .$ Using Young's inequality $| | a_{1} + a_{2} | |^{2} \leq (1 + β) | | a_{1} | |^{2} + (1 + \frac{1}{β}) | | a_{2} | |^{2}$ , and the L-smoothness of f_i, we bound $| | x^{k} - z_{i}^{k} | |^{2}$ for any β₁>0 and $i \in S_{k}$ as follows

\begin{array}{l} {‖ x^{k} - z_{i}^{k} ‖}^{2} = {‖ \frac{1}{λ} (w_{i}^{k + 1} - w_{i}^{k}) + \frac{γ}{λ} (\nabla f_{i} (w_{i}^{k + 1}) - \nabla f_{i} (w_{i}^{k})) ‖}^{2} \\ \leq \frac{2}{λ^{2}} {‖ w_{i}^{k + 1} - w_{i}^{k} ‖}^{2} + \frac{2 γ^{2}}{λ^{2}} {‖ (\nabla f_{i} (w_{i}^{k + 1}) - \nabla f_{i} (w_{i}^{k})) ‖}^{2} \\ \leq \frac{2}{λ^{2}} {‖ w_{i}^{k + 1} - w_{i}^{k} ‖}^{2} + \frac{2 γ^{2} L^{2}}{λ^{2}} {‖ (w_{i}^{k + 1} - w_{i}^{k}) ‖}^{2} \\ = \frac{2 (γ^{2} L^{2} + 1)}{λ^{2}} {‖ z_{i}^{k + 1} - m_{i}^{k + 1} - z_{i}^{k} + m_{i}^{k} ‖}^{2} \\ \leq \frac{2 (γ^{2} L^{2} + 1)}{λ^{2}} [(1 + β_{1}) ‖ z_{i}^{k + 1} - z_{i}^{k} ‖^{2} + 2 (1 \\ + \frac{1}{β_{1}}) (‖ m_{i}^{k + 1} ‖^{2} + ‖ m_{i}^{k} ‖^{2})], \end{array}

which proves (9).

We then establish the relationship between $\sum_{i = 1}^{n} | | x^{k} - z_{i}^{k} | |^{2}$ and the squared norm of the gradient mapping $| | G_{γ} (x^{k}) | |^{2}$ .

Lemma 2. Let ${(y_{i}^{k}, z_{i}^{k}, x_{i}^{k}, e_{i}^{k}, x^{k}, w_{i}^{k})}$ be generated by Algorithm 1 and Equation 10, and the gradient mapping $G_{γ}$ be defined by (5). Then, for any λ>0, β₂>0, and γ>0, we have

\begin{array}{l} \begin{array}{l} || G_{γ} (x^{k}) | |^{2} \leq \frac{2 {(1 + γ L)}^{2}}{n γ^{2}} \sum_{i = 1}^{n} [(1 + β_{2}) || z_{i}^{k} - x^{k} | |^{2} \\ + (1 + \frac{1}{β_{2}}) || m_{i}^{k} | |^{2}] + \frac{2}{n γ^{2}} \sum_{i = 1}^{n} || e_{i}^{k - 1} - e_{i}^{k} | |^{2} . \end{array} & (12) \end{array}

Proof. From the update of $x_{i}^{k + 1}$ , $e_{i}^{k + 1}$ in Algorithm 1 and (11), we have

\begin{array}{l} \begin{array}{l} \frac{1}{n} \sum_{i = 1}^{n} x_{i}^{k} = \frac{1}{n} \sum_{i = 1}^{n} (2 z_{i}^{k} - y_{i}^{k} + e_{i}^{k - 1} - e_{i}^{k}) \\ = \frac{1}{n} \sum_{i = 1}^{n} (2 z_{i}^{k} - w_{i}^{k} - γ \nabla f_{i} (w_{i}^{k}) + e_{i}^{k - 1} - e_{i}^{k}) . \end{array} & (13) \end{array}

From the update rule of x^k in Algorithm 1, the definition of $G_{γ} (x)$ , the non-expansive property of prox_γg, and the fact $\nabla f (x^{k}) = \frac{1}{n} \sum_{i = 1}^{n} \nabla f_{i} (x^{k})$ , we obtain that

\begin{array}{l} ‖ G_{γ} (x^{k}) ‖ = \frac{1}{γ} ‖ x^{k} - {prox}_{γ g} (x^{k} - γ \nabla f (x^{k})) ‖ \\ = \frac{1}{γ} ‖ {prox}_{γ g} (\frac{1}{n} \sum_{i = 1}^{n} x_{i}^{k}) - {prox}_{γ g} (x^{k} - γ \nabla f (x^{k})) ‖ \\ \leq \frac{1}{γ} ‖ \frac{1}{n} \sum_{i = 1}^{n} x_{i}^{k} - x^{k} + γ \nabla f (x^{k}) ‖ \\ = \frac{1}{n λ} | | \sum_{i = 1}^{n} [(2 z_{i}^{k} - w_{i}^{k} - x^{k}) + γ (\nabla f_{i} (x^{k}) - \nabla f_{i} (w_{i}^{k})) \\ + e_{i}^{k - 1} - e_{i}^{k}] ‖ . \end{array}

By applying the L-smoothness of f_i and the Young's inequality stated in Lemma 1, for any β₂>0 we deduce that

\begin{array}{l} || G_{γ} (x^{k}) | |^{2} \\ \leq \frac{1}{n^{2} γ^{2}} {[\sum_{i = 1}^{n} (|| 2 z_{i}^{k} - w_{i}^{k} - x^{k} || + γ L || z_{i}^{k} - {\bar{x}}^{k} || + || e_{i}^{k - 1} - e_{i}^{k} ||)]}^{2} \\ \leq \frac{1}{n γ^{2}} \sum_{i = 1}^{n} {(|| 2 z_{i}^{k} - w_{i}^{k} - x^{k} || + γ L || x^{k} - w_{i}^{k} || + || e_{i}^{k - 1} - e_{i}^{k} ||)}^{2} \\ \leq \frac{1}{n γ^{2}} \sum_{i = 1}^{n} {[(1 + γ L) || z_{i}^{k} - x^{k} || + (1 + γ L) || m_{i}^{k} || + || e_{i}^{k - 1} - e_{i}^{k} ||]}^{2} \\ \leq \frac{1}{n γ^{2}} {(1 + γ L)}^{2} \sum_{i = 1}^{n} [2 (1 + β_{2}) || z_{i}^{k} - x^{k} | |^{2} + 2 (1 + \frac{1}{β_{2}}) || m_{i}^{k} | |^{2} \\ + \frac{2}{{(1 + γ L)}^{2}} || e_{i}^{k - 1} - e_{i}^{k} | |^{2}] \\ \leq \frac{2 {(1 + γ L)}^{2}}{n γ^{2}} \sum_{i = 1}^{n} [(1 + β_{2}) || z_{i}^{k} - x^{k} | |^{2} + (1 + \frac{1}{β_{2}}) || m_{i}^{k} | |^{2} \\ + \frac{1}{{(1 + γ L)}^{2}} || e_{i}^{k - 1} - e_{i}^{k} | |^{2}], \end{array}

which proves (12).

Lemma 3. Let ${(y_{i}^{k}, z_{i}^{k}, x_{i}^{k}, e_{i}^{k}, x^{k})}$ be generated by Algorithm 1. Suppose that Assumptions 1 and 2 hold, and we define the Lyapunov function

\begin{array}{l} V^{k} (x^{k}) = g (x^{k}) + \frac{1}{n} \sum_{i = 1}^{n} [f_{i} (z_{i}^{k}) + 〈 \nabla f_{i} (z_{i}^{k}), x^{k} - z_{i}^{k} 〉 \\ + \frac{1}{2 γ} | | x^{k} - z_{i}^{k} | |^{2}], \end{array}

then by choosing

\begin{array}{l} 0 < γ < \frac{\sqrt{{(1 - \frac{λ}{4})}^{2} - λ^{2} β_{4} (4 β_{4} + 1)} - \frac{λ}{4}}{L (2 λ β_{4} + 1)} and \\ 0 < λ < \frac{min {\sqrt{4 β_{4} + \frac{17}{16}} - \frac{1}{4}, 2}}{4 β_{4} + 1}, \end{array}

and for any ε₁, β₁, β₄>0, we have

\begin{array}{l} 𝔼 [V^{k + 1} (x^{k + 1}) | A_{k - 1}] \leq V^{k} (x^{k}) - \frac{π}{2 n} \sum_{i = 1}^{n} || x^{k} - z_{i}^{k} | |^{2} + \frac{4 ε_{1}}{γ} ν^{2} \\ + \frac{1}{n} \sum_{i = 1}^{n} (δ_{1} {(ε_{i}^{k})}^{2} + δ_{2} {(ε_{i}^{k + 1})}^{2}), \end{array}

where

\begin{array}{l} π = \frac{p λ [2 - λ (1 + L γ) - 2 L^{2} γ^{2} - 4 λ β_{4} (1 + L^{2} γ^{2})]}{2 γ (1 + β_{1}) (γ^{2} L^{2} + 1)}, \\ δ_{1} = \frac{2 {(1 + γ L)}^{2}}{γ β_{4} λ^{2}} + \frac{[2 - λ (1 + L γ) - 2 L^{2} γ^{2} - 4 λ β_{4} (1 + L^{2} γ^{2})]}{λ γ β_{1}}, \\ δ_{2} = δ_{1} + \frac{(1 + γ^{2} L^{2})}{γ} . \end{array}

Proof. Given the definition ${\bar{x}}^{k} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}^{k}$ , the update rule $x^{k + 1} = {prox}_{γ g} ({\bar{x}}^{k + 1})$ in Algorithm 1 (hence $\frac{{\bar{x}}^{k + 1} - x^{k + 1}}{γ} \in \partial g (x^{k + 1})$ ), and the convexity of g, we obtain the following inequality

\begin{array}{l} g (x^{k + 1}) \leq g (x^{k}) - \frac{1}{γ} | | x^{k + 1} - x^{k} | |^{2} + \frac{1}{γ} 〈 {\bar{x}}^{k + 1} - x^{k}, x^{k + 1} - x^{k} 〉 . & (14) \end{array}

Combining Equations 10 and 11, we obtain

\begin{array}{l} z_{i}^{k + 1} + γ \nabla f_{i} (z_{i}^{k + 1}) = w_{i}^{k + 1} + γ \nabla f_{i} (w_{i}^{k + 1}) + m_{i}^{k + 1} + γ (\nabla f_{i} (z_{i}^{k + 1}) \\ - \nabla f_{i} (w_{i}^{k + 1})) \\ = y_{i}^{k + 1} + m_{i}^{k + 1} + γ (\nabla f_{i} (z_{i}^{k + 1}) - \nabla f_{i} (w_{i}^{k + 1})) . & (15) \end{array}

Next, using the update rules for $x_{i}^{k + 1}$ and $e_{i}^{k + 1}$ in Algorithm 1, we have

\begin{array}{l} {\bar{x}}^{k + 1} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}^{k + 1} = \frac{1}{n} \sum_{i = 1}^{n} (C (2 z_{i}^{k + 1} - y_{i}^{k + 1} + e_{i}^{k})) \\ = \frac{1}{n} \sum_{i = 1}^{n} (2 z_{i}^{k + 1} - y_{i}^{k + 1} + e_{i}^{k} - e_{i}^{k + 1}) . & (16) \end{array}

In order to establish the descent property of the Lyapunov function V^k+1(x^k+1), its second term is expanded and rearranged as follows

\begin{array}{l} \frac{1}{n} \sum_{i = 1}^{n} [f_{i} (z_{i}^{k + 1}) + 〈 \nabla f_{i} (z_{i}^{k + 1}), x^{k + 1} - z_{i}^{k + 1} 〉 + \frac{1}{2 γ} || x^{k + 1} - z_{i}^{k + 1} | |^{2}] \\ = \frac{1}{n} \sum_{i = 1}^{n} [f_{i} (z_{i}^{k + 1}) + 〈 \nabla f_{i} (z_{i}^{k + 1}), x^{k} - z_{i}^{k + 1} + x^{k + 1} - x^{k} 〉] \\ + \frac{1}{2 γ n} \sum_{i = 1}^{n} || x^{k} - z_{i}^{k + 1} + x^{k + 1} - x^{k} | |^{2} \\ = \frac{1}{n} \sum_{i = 1}^{n} [f_{i} (z_{i}^{k + 1}) + 〈 \nabla f_{i} (z_{i}^{k + 1}), x^{k} - z_{i}^{k + 1} 〉 + \frac{1}{2 γ} || x^{k} - z_{i}^{k + 1} | |^{2}] \\ + \frac{1}{n γ} \sum_{i = 1}^{n} 〈 x^{k} - 2 z_{i}^{k + 1} + (z_{i}^{k + 1} + γ \nabla f_{i} (z_{i}^{k + 1})), x^{k + 1} - x^{k} 〉 \\ + \frac{1}{2 γ} || x^{k + 1} - x^{k} | |^{2} \\ \overset{(15)}{=} \frac{1}{n} \sum_{i = 1}^{n} [f_{i} (z_{i}^{k + 1}) + 〈 \nabla f_{i} (z_{i}^{k + 1}), x^{k} - z_{i}^{k + 1} 〉 + \frac{1}{2 γ} || x^{k} - z_{i}^{k + 1} | |^{2}] \\ + \frac{1}{n γ} \sum_{i = 1}^{n} 〈 x^{k} - 2 z_{i}^{k + 1} + y_{i}^{k + 1}, x^{k + 1} - x^{k} 〉 + \frac{1}{2 γ} || x^{k + 1} - x^{k} | |^{2} \\ + \frac{1}{n γ} \sum_{i = 1}^{n} 〈 m_{i}^{k + 1} + γ (\nabla f_{i} (z_{i}^{k + 1}) - \nabla f_{i} (w_{i}^{k + 1})), x^{k + 1} - x^{k} 〉 \\ \overset{(16)}{=} \frac{1}{n} \sum_{i = 1}^{n} [f_{i} (z_{i}^{k + 1}) + 〈 \nabla f_{i} (z_{i}^{k + 1}), x^{k} - z_{i}^{k + 1} 〉 + \frac{1}{2 γ} || x^{k} - z_{i}^{k + 1} | |^{2}] \\ + \frac{1}{γ} 〈 x^{k} - {\bar{x}}^{k + 1} + \frac{1}{n} \sum_{i = 1}^{n} (e_{i}^{k} - e_{i}^{k + 1}), x^{k + 1} - x^{k} 〉 + \frac{1}{2 γ} || x^{k + 1} \\ - x^{k} | |^{2} + \frac{1}{n γ} \sum_{i = 1}^{n} 〈 m_{i}^{k + 1} + γ (\nabla f_{i} (z_{i}^{k + 1}) - \nabla f_{i} (w_{i}^{k + 1})), x^{k + 1} - x^{k} 〉 . & (17) \end{array}

Here, Equation 15 is used to separate the term $y_{i}^{k + 1}$ from the approximation error $m_{i}^{k + 1}$ , while Equation 16 expresses $2 z_{i}^{k + 1} - y_{i}^{k + 1}$ in terms of the average vector ${\bar{x}}^{k + 1}$ and the accumulated compression errors $e_{i}^{k + 1}$ and $e_{i}^{k}$ . Then, by combining Equations 14, 17 and using the definition of V^k+1(x^k+1), we obtain that

\begin{array}{l} V^{k + 1} (x^{k + 1}) \leq g (x^{k}) + \frac{1}{n} \sum_{i = 1}^{n} [f_{i} (z_{i}^{k + 1}) + 〈 \nabla f_{i} (z_{i}^{k + 1}), x^{k} - z_{i}^{k + 1} 〉 \\ + \frac{1}{2 γ} ‖ x^{k} - z_{i}^{k + 1} ‖^{2}] \\ + \frac{1}{n γ} \sum_{i = 1}^{n} 〈 e_{i}^{k} - e_{i}^{k + 1}, x^{k + 1} - x^{k} 〉 - \frac{1}{2 γ} ‖ x^{k + 1} - x^{k} ‖^{2} \\ + \frac{1}{n γ} \sum_{i = 1}^{n} 〈 m_{i}^{k + 1} + γ (\nabla f_{i} (z_{i}^{k + 1}) - \nabla f_{i} (w_{i}^{k + 1})), x^{k + 1} \\ - x^{k} 〉 . & (18) \end{array}

To bound the third term on the right-hand side of Equation 18, we employ the inequality $2 〈 a_{1}, a_{2} 〉 \leq ε_{1} | | a_{1} | |^{2} + \frac{1}{ε_{1}} | | a_{2} | |^{2}$ (for any ε₁>0) as follows

\begin{array}{l} \frac{1}{n γ} \sum_{i = 1}^{n} 〈 e_{i}^{k} - e_{i}^{k + 1}, x^{k + 1} - x^{k} 〉 \\ \leq \frac{1}{n γ} \sum_{i = 1}^{n} [ε_{1} || e_{i}^{k} - e_{i}^{k + 1} | |^{2} + \frac{1}{ε_{1}} || x^{k + 1} - x^{k} | |^{2}] \\ \leq \frac{1}{n γ} \sum_{i = 1}^{n} [2 ε_{1} || e_{i}^{k} | |^{2} + 2 ε_{1} || e_{i}^{k + 1} | |^{2}] \\ + \frac{1}{γ ε_{1}} || x^{k + 1} - x^{k} | |^{2} \\ \leq \frac{2 ε_{1}}{n γ} \sum_{i = 1}^{n} [|| e_{i}^{k} | |^{2} + || e_{i}^{k + 1} | |^{2}] \\ + \frac{1}{γ ε_{1}} || x^{k + 1} - x^{k} | |^{2} . & (19) \end{array}

For $i \notin S_{k}$ , we have $w_{i}^{k + 1} = w_{i}^{k}$ . Applying Young's inequality stated in Lemma 1 with any β₃>0, we can evaluate the five term on the right-hand side of Equation 18 as follows

\begin{array}{l} \frac{1}{n γ} \sum_{i = 1}^{n} 〈 m_{i}^{k + 1} + γ (\nabla f_{i} (z_{i}^{k + 1}) - \nabla f_{i} (w_{i}^{k + 1})), x^{k + 1} - x^{k} 〉 \\ \leq \frac{1}{2 n γ} \sum_{i = 1}^{n} [\frac{1}{β_{3}} || m_{i}^{k + 1} + γ (\nabla f_{i} (z_{i}^{k + 1}) - \nabla f_{i} (w_{i}^{k + 1})) | |^{2} + β_{3} || x^{k + 1} \\ - x^{k} | |^{2}] \\ \leq \frac{1}{n γ β_{3}} \sum_{i = 1}^{n} [|| m_{i}^{k + 1} | |^{2} + γ^{2} \sum_{i = 1}^{n} || \nabla f_{i} (x_{i}^{k + 1}) - \nabla f_{i} (z_{i}^{k + 1}) | |^{2}] \\ + \frac{β_{3}}{2 γ} || x^{k + 1} - x^{k} | |^{2} \\ \leq \frac{(1 + γ^{2} L^{2})}{n γ β_{3}} [\sum_{i \notin S_{k}} || m_{i}^{k} | |^{2} + \sum_{i \in S_{k}} || m_{i}^{k + 1} | |^{2}] + \frac{β_{3}}{2 γ} || x^{k + 1} - x^{k} | |^{2} . & (20) \end{array}

To streamline the notation, denote

\begin{array}{l} Ψ_{k + 1} = - \frac{1}{γ} (\frac{1}{2} - \frac{1}{ε_{1}} - \frac{β_{3}}{2}) || x^{k + 1} - x^{k} | |^{2} \\ + \frac{2 ε_{1}}{n γ} \sum_{i = 1}^{n} [|| e_{i}^{k} | |^{2} + || e_{i}^{k + 1} | |^{2}] \\ + \frac{(1 + γ^{2} L^{2})}{n γ β_{3}} [\sum_{i \notin S_{k}} || m_{i}^{k} | |^{2} + \sum_{i \in S_{k}} || m_{i}^{k + 1} | |^{2}], & (21) \end{array}

and substituting Equations 19 and 20 into Equations 18, we obtain an expanded expression for V^k+1. Differentiating between the active client set $S_{k}$ and the inactive set, and employing the L- smoothness of f_i (i.e., $f_{i} (z_{i}^{k + 1}) \leq f_{i} (z_{i}^{k}) + 〈 \nabla f_{i} (z_{i}^{k}), z_{i}^{k + 1} - z_{i}^{k} 〉 + \frac{L}{2} | | z_{i}^{k + 1} - z_{i}^{k} | |^{2}$ ), we have

\begin{array}{l} V^{k + 1} (x^{k + 1}) \leq g (x^{k}) + \frac{1}{n} \sum_{i = 1}^{n} [f_{i} (z_{i}^{k + 1}) + 〈 \nabla f_{i} (z_{i}^{k + 1}), x^{k} - z_{i}^{k + 1} 〉 \\ + \frac{1}{2 γ} | | x^{k} - z_{i}^{k + 1} | |^{2}] + Ψ_{k + 1} \\ (by the fact that only i \in S_{k} perform update) \\ = g (x^{k}) + \frac{1}{n} \sum_{i \in S_{k}} f_{i} (z_{i}^{k + 1}) + \frac{1}{n} \sum_{i \in S_{k}} 〈 \nabla f_{i} (z_{i}^{k + 1}), z_{i}^{k} - z_{i}^{k + 1} 〉 \\ + \frac{1}{n} \sum_{i \in S_{k}} 〈 \nabla f_{i} (z_{i}^{k + 1}), x^{k} - z_{i}^{k} 〉 + \frac{1}{2 n γ} \sum_{i \in S_{k}} | | x^{k} - z_{i}^{k + 1} | |^{2} \\ + \frac{1}{n} \sum_{i \notin S_{k}} f_{i} (z_{i}^{k}) + \frac{1}{n} \sum_{i \notin S_{k}} 〈 \nabla f_{i} (z_{i}^{k}), x^{k} - z_{i}^{k} 〉 \\ + \frac{1}{2 n γ} \sum_{i \notin S_{k}} | | x^{k} - z_{i}^{k} | |^{2} + Ψ_{k + 1} \\ (by the L - smoothness of f_{i}) \\ \leq g (x^{k}) + \frac{1}{n} \sum_{i \in S_{k}} f_{i} (z_{i}^{k}) + \frac{L}{2 n} \sum_{i \in S_{k}} | | z_{i}^{k + 1} - z_{i}^{k} | |^{2} \\ + \frac{1}{n} \sum_{i \in S_{k}} 〈 \nabla f_{i} (z_{i}^{k + 1}), x^{k} - z_{i}^{k} 〉 + \frac{1}{2 n γ} \sum_{i \in S_{k}} | | x^{k} - z_{i}^{k + 1} | |^{2} \\ + \frac{1}{n} \sum_{i \notin S_{k}} f_{i} (z_{i}^{k}) + \frac{1}{n} \sum_{i \notin S_{k}} 〈 \nabla f_{i} (z_{i}^{k}), x^{k} - z_{i}^{k} 〉 \\ + \frac{1}{2 n γ} \sum_{i \notin S_{k}} | | x^{k} - z_{i}^{k} | |^{2} + Ψ_{k + 1} \\ = g (x^{k}) + \frac{1}{n} \sum_{i = 1}^{n} f_{i} (z_{i}^{k}) + \frac{1}{n} \sum_{i = 1}^{n} 〈 \nabla f_{i} (z_{i}^{k}), x^{k} - z_{i}^{k} 〉 \\ + \frac{L}{2 n} \sum_{i \in S_{k}} | | z_{i}^{k + 1} - z_{i}^{k} | |^{2} \\ + \frac{1}{2 n γ} \sum_{i \in S_{k}} | | x^{k} - z_{i}^{k + 1} | |^{2} + \frac{1}{n} \sum_{i \in S_{k}} 〈 \nabla f_{i} (z_{i}^{k + 1}) - \nabla f_{i} (z_{i}^{k}), x^{k} - z_{i}^{k} 〉 \\ + \frac{1}{2 n γ} \sum_{i \notin S_{k}} | | x^{k} - z_{i}^{k} | |^{2} + Ψ_{k + 1} . & (22) \end{array}

Next, applying the square-norm expansion

\begin{array}{l} || x^{k} - z_{i}^{k + 1} | |^{2} = || x^{k} - z_{i}^{k} | |^{2} + 2 〈 x^{k} - z_{i}^{k}, z_{i}^{k} - z_{i}^{k + 1} 〉 + || z_{i}^{k} \\ - z_{i}^{k + 1} | |^{2} . \end{array}

For non-updated clients $i \notin S_{k}$ , the local variable remains unchanged, i.e., $z_{i}^{k + 1} = z_{i}^{k}$ . Substituting these relations into the original expression gives

\begin{array}{l} \frac{1}{2 n γ} \sum_{i \in S_{k}} || x^{k} - z_{i}^{k + 1} | |^{2} + \frac{1}{2 n γ} \sum_{i \notin S_{k}} || x^{k} - z_{i}^{k} | |^{2} \\ = \frac{1}{2 n γ} \sum_{i = 1}^{n} || x^{k} - z_{i}^{k} | |^{2} + \frac{1}{2 n γ} \sum_{i \in S_{k}} [2 〈 x^{k} - z_{i}^{k}, z_{i}^{k} - z_{i}^{k + 1} 〉 \\ + || z_{i}^{k} - z_{i}^{k + 1} | |^{2}], \end{array}

Inserting the reorganized expression into the expansion of V^k+1(x^k+1) and collecting common terms gives

\begin{array}{l} \begin{array}{l} V^{k + 1} (x^{k + 1}) = V^{k} (x^{k}) + \frac{1}{n} \sum_{i \in S_{k}} 〈 \nabla f_{i} (z_{i}^{k + 1}) - \nabla f_{i} (z_{i}^{k}), x^{k} - z_{i}^{k} 〉 \\ + \frac{1}{n γ} \sum_{i \in S_{k}} 〈 z_{i}^{k + 1} - z_{i}^{k}, z_{i}^{k} - x^{k} 〉 \\ + \frac{1 + L γ}{2 n γ} \sum_{i \in S_{k}} | | z_{i}^{k + 1} - z_{i}^{k} | |^{2} + Ψ_{k + 1} . \end{array} & (23) \end{array}

Then, from the update rule of $y_{i}^{k + 1}$ in Algorithm 1 together with Equations 10 and 11, we derive an expression for $z_{i}^{k} - x^{k}$ :

\begin{array}{l} z_{i}^{k} - x^{k} = \frac{1}{λ} (y_{i}^{k} - y_{i}^{k + 1}) \\ = \frac{1}{λ} (w_{i}^{k} - w_{i}^{k + 1}) + \frac{γ}{λ} (\nabla f_{i} (w_{i}^{k}) - \nabla f_{i} (w_{i}^{k + 1})) \\ = \frac{1}{λ} (z_{i}^{k} - z_{i}^{k + 1}) + \frac{γ}{λ} (\nabla f_{i} (z_{i}^{k}) - \nabla f_{i} (z_{i}^{k + 1})) \\ + \frac{1}{λ} [(m_{i}^{k + 1} + γ (\nabla f_{i} (z_{i}^{k + 1}) - \nabla f_{i} (w_{i}^{k + 1}))) \\ - (m_{i}^{k} + γ (\nabla f_{i} (z_{i}^{k}) - \nabla f_{i} (w_{i}^{k})))] \\ = \frac{1}{λ} (z_{i}^{k} - z_{i}^{k + 1}) + \frac{γ}{λ} (\nabla f_{i} (z_{i}^{k}) - \nabla f_{i} (z_{i}^{k + 1})) + n_{i}^{k}, & (24) \end{array}

where $n_{i}^{k}$ is a composite error term involving the approximation errors $m_{i}^{k}$ , $m_{i}^{k + 1}$ and gradient differences. The subsequent analysis will control the impact of $n_{i}^{k}$ via its norm bound. It is defined as

\begin{array}{l} n_{i}^{k} = \frac{1}{λ} [(m_{i}^{k + 1} + γ (\nabla f_{i} (z_{i}^{k + 1}) \\ - \nabla f_{i} (w_{i}^{k + 1}))) - (m_{i}^{k} + γ (\nabla f_{i} (z_{i}^{k}) - \nabla f_{i} (w_{i}^{k})))], \end{array}

Its squared norm satisfies

\begin{array}{l} || n_{i}^{k} | |^{2} = \frac{1}{λ^{2}} || m_{i}^{k + 1} - m_{i}^{k} + γ (\nabla f_{i} (z_{i}^{k + 1}) - \nabla f_{i} (w_{i}^{k + 1})) \\ + γ (\nabla f_{i} (w_{i}^{k}) - \nabla f_{i} (z_{i}^{k})) | |^{2} \\ \leq \frac{2 {(1 + γ L)}^{2}}{λ^{2}} [|| m_{i}^{k} | |^{2} + || m_{i}^{k + 1} | |^{2}] \end{array}

By applying the L-smoothness of f_i, the Young's inequality, and Equation 24, we obtain for any β₄>0 that

\begin{array}{l} V^{k + 1} (x^{k + 1}) \leq V^{k} (x^{k}) + \frac{[λ (1 + L γ) - 2]}{2 λ γ n} \sum_{i \in S_{k}} || z_{i}^{k + 1} - z_{i}^{k} | |^{2} \\ + \frac{γ}{λ n} \sum_{i \in S_{k}} || \nabla f_{i} (z_{i}^{k + 1}) - \nabla f_{i} (z_{i}^{k}) | |^{2} \\ + \frac{1}{γ n} \sum_{i \in S_{k}} 〈 n_{i}^{k}, (z_{i}^{k + 1} - z_{i}^{k}) + γ (\nabla f_{i} (z_{i}^{k}) - \nabla f_{i} (z_{i}^{k + 1})) 〉 + Ψ_{k + 1} \\ (b y t h e L - s m o o t h n e s s o f f_{i}) \\ \leq V^{k} (x^{k}) + \frac{γ L^{2}}{λ n} \sum_{i \in S_{k}} || z_{i}^{k + 1} - z_{i}^{k} | |^{2} \\ + \frac{[λ (1 + L γ) - 2]}{2 λ γ n} \sum_{i \in S_{k}} || z_{i}^{k + 1} - z_{i}^{k} | |^{2} + Ψ_{k + 1} \\ + \frac{1}{γ n} \sum_{i \in S_{k}} [\frac{1}{β_{4}} || n_{i}^{k} | |^{2} + 2 β_{4} || z_{i}^{k} - z_{i}^{k + 1} | |^{2} + 2 β_{4} γ^{2} || \nabla f_{i} (z_{i}^{k}) \\ - \nabla f_{i} (z_{i}^{k + 1}) | |^{2}] \\ \leq V^{k} (x^{k}) - \frac{[2 - λ (1 + L γ) - 2 L^{2} γ^{2} - 4 λ β_{4} (1 + L^{2} γ^{2})]}{2 λ γ n} \\ \sum_{i \in S_{k}} || z_{i}^{k + 1} - z_{i}^{k} | |^{2} + \frac{1}{γ β_{4} n} \sum_{i \in S_{k}} || n_{i}^{k} | |^{2} + Ψ_{k + 1} \\ \leq V^{k} (x^{k}) - \frac{[2 - λ (1 + L γ) - 2 L^{2} γ^{2} - 4 λ β_{4} (1 + L^{2} γ^{2})]}{2 λ γ n} \\ \sum_{i \in S_{k}} || z_{i}^{k + 1} - z_{i}^{k} | |^{2} \\ + \frac{2 {(1 + γ L)}^{2}}{γ β_{4} λ^{2} n} \sum_{i \in S_{k}} [|| m_{i}^{k} | |^{2} + || m_{i}^{k + 1} | |^{2}] + Ψ_{k + 1} . & (25) \end{array}

Next, leveraging the L-smoothness of f_i and assuming $γ \leq \frac{1}{L}$ , we demonstrate the boundedness of V^k(x^k)

\begin{array}{l} V^{k} (x^{k}) = g (x^{k}) + \frac{1}{n} \sum_{i = 1}^{n} [f_{i} (z_{i}^{k}) + 〈 \nabla f_{i} (z_{i}^{k}), x^{k} - z_{i}^{k} 〉 \\ + \frac{1}{2 γ} || x^{k} - z_{i}^{k} | |^{2}] \\ \geq g (x^{k}) + \frac{1}{n} \sum_{i = 1}^{n} [f_{i} (x^{k}) - \frac{L}{2} || x^{k} - z_{i}^{k} | |^{2} + \frac{1}{2 γ} || x^{k} - z_{i}^{k} | |^{2}] \\ \geq F (x^{k}) + (\frac{1}{2 γ} - \frac{L}{2}) \frac{1}{n} \sum_{i = 1}^{n} || x^{k} - z_{i}^{k} | |^{2} \\ \geq F^{*} . \end{array}

From Lemma 1, we have

\begin{array}{l} \frac{λ^{2}}{2 (1 + β_{1}) (γ^{2} L^{2} + 1)} \sum_{i \in S_{k}} || x^{k} - z_{i}^{k} | |^{2} \leq \sum_{i \in S_{k}} [|| z_{i}^{k + 1} - z_{i}^{k} | |^{2} \\ + \frac{2}{β_{1}} (|| m_{i}^{k + 1} | |^{2} + || m_{i}^{k} | |^{2})] . & (26) \end{array}

According to the sampling scheme, we consider the expectation of $\sum_{i \in S_{k}} | | z_{i}^{k + 1} - z_{i}^{k} | |^{2}$ with respect to $S_{k}$ conditioned on $A_{k - 1}$ . Combined with (26), this yields

\begin{array}{l} \begin{array}{l} 𝔼 [\sum_{i \in S_{k}} || z_{i}^{k + 1} - z_{i}^{k} | |^{2} | A_{k - 1}] \\ = \sum_{S} ℙ (S_{k} = S) \sum_{i \in S} || z_{i}^{k + 1} - z_{i}^{k} | |^{2} = \sum_{i = 1}^{n} p_{i} || z_{i}^{k + 1} - z_{i}^{k} | |^{2} \\ \geq \frac{p λ^{2}}{2 (1 + β_{1}) (γ^{2} L^{2} + 1)} \sum_{i = 1}^{n} || x^{k} - z_{i}^{k} | |^{2} \\ - \frac{2 p}{β_{1}} \sum_{i = 1}^{n} (|| m_{i}^{k + 1} | |^{2} + || m_{i}^{k} | |^{2}), \end{array} & (27) \end{array}

where p = minp_i∈(0, 1], i∈[n]. By taking the conditional expectation of Equation 25 with respect to $S_{k}$ conditioned on $A_{k - 1}$ , and combining it with Equations 10, 21, 27 under the setting β₃ = 1, we derive the following

\begin{array}{l} E [V^{k + 1} (x^{k + 1}) | A_{k - 1}] \\ \overset{(21)}{\leq} V^{k} (x^{k}) + \frac{2 {(1 + γ L)}^{2}}{γ β_{4} λ^{2} n} \sum_{i = 1}^{n} p_{i} [‖ m_{i}^{k} ‖^{2} + ‖ m_{i}^{k + 1} ‖^{2}] \\ - \frac{[2 - λ (1 + L γ) - 2 L^{2} γ^{2} - 4 λ β_{4} (1 + L^{2} γ^{2})]}{2 λ γ n} \\ E [\sum_{i \in S_{k}} {|| z_{i}^{k + 1} - z_{i}^{k} ||}^{2} | A_{k - 1}] \\ + \frac{2 ε_{1}}{n γ} E [\sum_{i = 1}^{n} {|| e_{i}^{k} ||}^{2} + \sum_{i = 1}^{n} {|| e_{i}^{k + 1} ||}^{2}] + \frac{(1 + γ^{2} L^{2})}{n γ} \\ \sum_{i = 1}^{n} [(1 - p_{i}) ‖ m_{i}^{k} ‖^{2} + p_{i} ‖ m_{i}^{k + 1} ‖^{2}] \\ (by the definition of absolute compressor) \\ \overset{(27)}{\leq} V^{k} (x^{k}) + \frac{2 {(1 + γ L)}^{2}}{γ β_{4} λ^{2} n} \sum_{i = 1}^{n} p_{i} [‖ m_{i}^{k} ‖^{2} + ‖ m_{i}^{k + 1} ‖^{2}] \\ - \frac{p λ [2 - λ (1 + L γ) - 2 L^{2} γ^{2} - 4 λ β_{4} (1 + L^{2} γ^{2})]}{4 γ n (1 + β_{1}) (γ^{2} L^{2} + 1)} \\ \sum_{i = 1}^{n} {|| x^{k} - z_{i}^{k} ||}^{2} \\ + \frac{p [2 - λ (1 + L γ) - 2 L^{2} γ^{2} - 4 λ β_{4} (1 + L^{2} γ^{2})]}{λ γ β_{1} n} \\ \sum_{i = 1}^{n} (‖ m_{i}^{k + 1} ‖^{2} + ‖ m_{i}^{k} ‖^{2}) \\ + \frac{4 ε_{1}}{γ} ν^{2} + \frac{(1 + γ^{2} L^{2})}{n γ} \sum_{i = 1}^{n} [(1 - p_{i}) ‖ m_{i}^{k} ‖^{2} + p_{i} ‖ m_{i}^{k + 1} ‖^{2}] \\ \overset{(10)}{\leq} V^{k} (x^{k}) - \frac{π}{2 n} \sum_{i = 1}^{n} {|| x^{k} - z_{i}^{k} ||}^{2} + \frac{4 ε_{1}}{γ} ν^{2} + \frac{1}{n} \sum_{i = 1}^{n} (δ_{1} (ε_{i}^{k})^{2} \\ + δ_{2} (ε_{i}^{k + 1})^{2}) . \end{array}

To guarantee the descent property, let

\begin{array}{l} π & = \frac{p λ [2 - λ (1 + L γ) - 2 L^{2} γ^{2} - 4 λ β_{4} (1 + L^{2} γ^{2})]}{2 γ (1 + β_{1}) (γ^{2} L^{2} + 1)} > 0 . \end{array}

Then, we have

\begin{array}{l} 0 < λ < \frac{min {\sqrt{4 β_{4} + \frac{17}{16}} - \frac{1}{4}, 2}}{4 β_{4} + 1} and \\ 0 < γ < \frac{\sqrt{{(1 - \frac{λ}{4})}^{2} - λ^{2} β_{4} (4 β_{4} + 1)} - \frac{λ}{4}}{L (2 λ β_{4} + 1)} . \end{array}

Theorem 1. Let ${(y_{i}^{k}, z_{i}^{k}, x_{i}^{k}, e_{i}^{k}, x^{k})}$ be generated by Algorithm 1. Suppose that Assumptions 1 and 2 hold, for $0 < γ < \frac{\sqrt{{(1 - \frac{λ}{4})}^{2} - λ^{2} β_{4} (4 β_{4} + 1)} - \frac{λ}{4}}{L (2 λ β_{4} + 1)} and 0 < λ < \frac{min {\sqrt{4 β_{4} + \frac{17}{16}} - \frac{1}{4}, 2}}{4 β_{4} + 1}$ , we have

\begin{array}{l} \begin{array}{l} \frac{1}{K} \sum_{k = 0}^{K - 1} 𝔼 [| | G_{γ} (x^{k}) | |^{2}] \leq \frac{M_{1}}{K} (F (x^{0}) - F^{*}) \\ + \frac{1}{n K} \sum_{k = 0}^{K - 1} \sum_{i = 1}^{n} [M_{2} {(ε_{i}^{k})}^{2} + M_{3} {(ε_{i}^{k + 1})}^{2}] \\ + \frac{M_{4}}{K} ν^{2}, \end{array} & (28) \end{array}

where

\begin{array}{l} M_{1} = \frac{4 (1 + β_{2}) {(1 + γ L)}^{2}}{π γ^{2}}, M_{2} = \frac{(2 δ_{1} β_{2} + π)}{β_{2}} M_{1} \\ M_{3} = δ_{2} M_{1}, M_{4} = \frac{4 ε_{1} K}{γ} M_{1} + \frac{4 K}{n γ^{2}}, \end{array}

with ε₁, β₂>0, and π, δ₁, δ₂ defined in Lemma 3.

Proof. First, it follows from Lemma 3 that

\begin{array}{l} \sum_{i = 1}^{n} | | x^{k} - z_{i}^{k} | |^{2} \leq \frac{2 n}{π} [V^{k} (x^{k}) - E [V^{k + 1} (x^{k + 1}) | A_{k - 1}] \\ + \frac{4 ε_{1}}{γ} ν^{2} + \frac{1}{n} \sum_{i = 1}^{n} (δ_{1} {(ε_{i}^{k})}^{2} + δ_{2} {(ε_{i}^{k + 1})}^{2})] . \end{array}

Combining the derived estimates and Lemma 2, we obtain

\begin{array}{l} \begin{array}{l} || G_{γ} (x^{k}) | |^{2} \leq \frac{2 {(1 + γ L)}^{2}}{n γ^{2}} \sum_{i = 1}^{n} [(1 + β_{2}) || z_{i}^{k} - x^{k} | |^{2} \\ + (1 + \frac{1}{β_{2}}) || m_{i}^{k} | |^{2}] + \frac{2}{n γ^{2}} \sum_{i = 1}^{n} || e_{i}^{k - 1} - e_{i}^{k} | |^{2} \\ \leq \frac{4 (1 + β_{2}) {(1 + γ L)}^{2}}{π γ^{2}} [V^{k} (x^{k}) \\ - 𝔼 [V^{k + 1} (x^{k + 1}) | A_{k - 1}]] \\ + \frac{4 (1 + β_{2}) {(1 + γ L)}^{2}}{n π γ^{2}} \sum_{i = 1}^{n} (δ_{1} {(ε_{i}^{k})}^{2} + δ_{2} {(ε_{i}^{k + 1})}^{2}) \\ + \frac{2 (1 + β_{2}) {(1 + γ L)}^{2}}{n γ^{2} β_{2}} {(ε_{i}^{k})}^{2} + \frac{2}{n γ^{2}} \sum_{i = 1}^{n} || e_{i}^{k - 1} - e_{i}^{k} | |^{2} \\ + \frac{16 (1 + β_{2}) {(1 + γ L)}^{2} ε_{1}}{π γ^{3}} ν^{2} . \end{array} & (29) \end{array}

Taking the total expectation of $| | G_{γ} (x^{k}) | |^{2}$ with respect to $A_{k}$ , and by using the update of $e_{i}^{k}$ and the definition of the absolute compressor, we obtain the following result

\begin{array}{l} 𝔼 [| | G_{γ} (x^{k}) | |^{2}] \leq M_{1} (𝔼 [V^{k} (x^{k})] - 𝔼 [V^{k + 1} (x^{k + 1})]) \\ + \frac{M_{2}}{n} \sum_{i = 1}^{n} {(ε_{i}^{k})}^{2} + \frac{M_{3}}{n} \sum_{i = 1}^{n} {(ε_{i}^{k + 1})}^{2} + \frac{M_{4}}{K} ν^{2}, \end{array}

where

\begin{array}{l} M_{1} = \frac{4 (1 + β_{2}) {(1 + γ L)}^{2}}{π γ^{2}}, M_{2} = \frac{\begin{matrix} 2 (1 + β_{2}) {(1 + γ L)}^{2} (4 δ_{1} β_{2} \\ + 2 π) \end{matrix}}{γ^{2} β_{2} π} \\ M_{3} = \frac{4 (1 + β_{2}) {(1 + γ L)}^{2} δ_{2}}{π γ^{2}}, M_{4} = \frac{16 (1 + β_{2}) {(1 + γ L)}^{2} ε_{1} K}{π γ^{3}} \\ + \frac{4 K}{n γ^{2}}, \end{array}

is four constants. Summing the inequality over k from 0 to K−1, and then scaling the resultant sum by $\frac{1}{K}$ , we derive

\begin{array}{l} \begin{array}{l} \frac{1}{K} \sum_{k = 0}^{K - 1} 𝔼 [| | G_{γ} (x^{k}) | |^{2}] \leq \frac{M_{1}}{K} (𝔼 [V^{0} (x^{0})] - E [V^{K} (x^{K})]) \\ + \frac{1}{K} \sum_{k = 0}^{K - 1} [\frac{M_{2}}{n} \sum_{i = 1}^{n} {(ε_{i}^{k})}^{2} + \frac{M_{3}}{n} \sum_{i = 1}^{n} {(ε_{i}^{k + 1})}^{2} \\ + \frac{M_{4}}{K} ν^{2}] . \end{array} & (30) \end{array}

With the initial condition $z_{i}^{0} = x^{0}$ , we obtain $V^{0} (x^{0}) = g (x^{0}) + \frac{1}{n} \sum_{i = 1}^{n} f_{i} (z_{i}^{0}) = F (x^{0})$ . Together with the lower bound E[V^k+1(x^k+1)]≥F^*, this implies that Equation 30 simplifies to

\begin{array}{l} \begin{array}{l} \frac{1}{K} \sum_{k = 0}^{K - 1} 𝔼 [|| G_{γ} (x^{k}) | |^{2}] \leq \frac{M_{1}}{K} (F (x^{0}) - F^{*}) \\ + \frac{1}{n K} \sum_{k = 0}^{K - 1} \sum_{i = 1}^{n} [M_{2} {(ε_{i}^{k})}^{2} + M_{3} {(ε_{i}^{k + 1})}^{2}] \\ + \frac{M_{4}}{K} ν^{2}, \end{array} & (31) \end{array}

which proves Equation 28.

Corollary 1. Suppose that Assumptions 1 and 2 hold, EF-Feddr (Algorithm 1) will find a ε-stationary point x such that $E | | G_{γ} (x^{k}) | | \leq ε$ in the following number of iterations

\begin{array}{l} K \geq \frac{M_{1} [F (x^{0}) - F^{*}] + (M_{2} + M_{3}) M + M_{4} ν^{2}}{ε^{2}}, \end{array}

where M>0 is a constant, and M₁, M₂, M₃, M₄ are defined in Theorem 1. Consequently, the communication complexity is $K = O (\frac{1}{ε^{2}})$ .

Proof. As described in Tran-Dinh et al. (2021), the choice of accuracies $ε_{i}^{k}$ is constrained such that for a given constant M>0, $\frac{1}{n} \sum_{k = 0}^{K - 1} \sum_{i = 1}^{n} {(ε_{i}^{k})}^{2} \leq M$ . Therefore,

\begin{array}{l} \frac{1}{K} \sum_{k = 0}^{K - 1} 𝔼 [|| G_{γ} (x^{k}) | |^{2}] & \leq \frac{\begin{array}{l} M_{1} (F (x^{0}) - F^{*}) + (M_{2} + M_{3}) M \\ + M_{4} ν^{2} \end{array}}{K} . & (32) \end{array}

Consequently, to guarantee $E | | G_{γ} (x^{k}) | | \leq ε$ , we have

\begin{array}{l} K \geq \frac{M_{1} [F (x^{0}) - F^{*}] + (M_{2} + M_{3}) M + M_{4} ν^{2}}{ε^{2}} . \end{array}

Therefore, we can take $(K = ⌊ \frac{M_{1} [F (x^{0}) - F^{*}] + (M_{2} + M_{3}) M + M_{4} ν^{2}}{ε^{2}} ⌋ = O (\frac{1}{ε^{2}}))$ as its lower bound.

5 Experiments

In the experiments, we evaluate EF-Feddr against Eco-FedSplit (Khirirat et al., 2022), Eco-FedProx (Khirirat et al., 2022), and FedDR (Tran-Dinh et al., 2021). In all compression-based baselines, the compression operator C denotes Top-k sparsification. For a fair comparison, we implement Eco-FedSplit, Eco-FedProx, and EF-Feddr on top of the FedDR framework. All experiments are conducted in TensorFlow (Abadi et al., 2016) on a cluster equipped with NVIDIA Tesla P100 (16 GB) GPUs. We next describe the datasets and models used in our study.

5.1 Non-IID datasets

We evaluate on both synthetic and real-world datasets: synthetic-(l, s), FEMNIST, and Shakespeare. Following prior studies (Caldas et al., 2018; Tran-Dinh et al., 2021), we generate synthetic-(l, s) with (l, s) = {(0, 0), (1, 1)}, where l controls the number of differing local models and s controls the degree of local data heterogeneity; larger l and s imply stronger non-IID heterogeneity. FEMNIST extends MNIST to 62 classes with over 800k samples; we use an 80%/20% train/test split and partition by writer, which naturally induces client-level heterogeneity. Shakespeare is a character-level language modeling corpus; we partition by user/play, so each client holds a distinct subset of texts (plays/scenes), yielding non-uniform label distributions across clients. In this context, the degree of non-IID-ness within each client's dataset is quantified by the number of classes present. Specifically, the Shakespeare dataset's non-IID-ness is delineated by the allocation of various plays' texts among clients. Each client is allocated a distinct subset of the corpus, which may include a varying number of plays and scenes. This results in a non-uniform distribution of text, where certain clients predominantly receive data from specific plays, whereas others obtain a more diverse range of content. Analogously, the FEMNIST dataset establishes non-IID-ness through the distribution of handwriting samples across different writers. Each client's dataset comprises samples from a subset of writers, thereby leading to variability in handwriting styles and features among clients. The datasets and model configurations used in our experiments are summarized in Table 2, which outlines their key statistical characteristics.

Table 2

Table 2. Dataset and model characteristics for federated training.

5.2 Models and hyper-parameters selection

We use a fully connected network with a 60-32-10 architecture and train it for 200 communication rounds with a learning rate of 0.01 on all synthetic datasets. At each round, 10 out of 30 clients are sampled. To evaluate the algorithm's performance with an increased number of clients, we further extended the Synthetic-(1,1) setup from the original 30 clients to 90 clients while preserving the statistical characteristics defined by the (l, s) parameters. The data generation process maintained the same non-IID partition pattern and per-client data distribution profile as the original setup. The client sampling ratio was kept constant at 1/3 (that is, selecting 30 out of 90 clients per round). Eco-FedSplit applies error-compensated compression to FedSplit, and Eco-FedProx does so to FedProx. To study an image classification problem on FEMNIST, we employ artificial neural networks (ANN) consisting of two fully connected layers. The first layer has 128 neurons followed by a ReLU activation function, and the second layer has 62 neurons followed by a softmax activation function for classification. In this experiment, we sample 50 clients out of 200 to perform updates at each communication round for all the above-mentioned algorithms. The model used for FEMNIST is trained for 200 communication rounds in total with an optimal learning rate of 0.003. Consistent with prior research (Li et al., 2020), our approach to character-level prediction in the Shakespeare dataset utilizes a recurrent neural network (RNN) architecture. Specifically, we deploy a two-layer stacked LSTM classifier, each layer comprising 256 hidden units. Each input sequence is structured to include 80 characters, which are initially embedded into an eight-dimensional space prior to LSTM processing. The model subsequently generates a 62-class softmax distribution over the character vocabulary for each training instance. The training regimen involves a total of 50 communication rounds. An optimal learning rate of 0.08 is determined for the four operator-splitting-based federated learning algorithms employed in this study. Parameters for each algorithm such as α∈(0, 2) and η∈[1, 1, 000] for FedDR, μ∈[0.001, 1] for Eco-FedProx, and λ∈(0, 2) and γ∈[1, 1, 000] for EF-Feddr are tuned from a large range of values. For each dataset, we pick the most suitable parameters for each algorithm.

5.3 Comparison of methods

Figures 1–3 report training loss/accuracy and test accuracy vs. communication rounds and communication cost on the synthetic datasets; Figure 4 shows the same on FEMNIST. A key observation is that expanding the total number of clients does not substantially degrade the performance of EF-Feddr. Experimental results under the scaled setting (Figure 3) confirm this: the algorithm maintains nearly identical convergence speed and final accuracy compared to the original 30-client scenario (Figure 2). Across heterogeneous settings, EF-Feddr consistently outperforms the baselines. On FEMNIST, EF-Feddr reaches 80.5% test accuracy at round 50, whereas Eco-FedSplit attains 74.5% only at round 200. Within 200 rounds, EF-Feddr improves accuracy by 12.97% and 7.93% over Eco-FedSplit and Eco-FedProx, respectively. On synthetic-(0, 0), EF-Feddr exceeds the two baselines by 3.88% and 8.40%; on synthetic-(1, 1), by 7.20% and 3.29%. On Shakespeare, Figure 5 shows EF-Feddr also surpasses two Douglas–Rachford splitting-based FL algorithms: Eco-FedSplit and FedDR. As shown in Table 3, EF-Feddr requires 18.64%–85.41% less runtime and 48.03%–93.18% less communication than baseline methods to achieve the same target test accuracy of 60% on synthetic and 70% on FEMNIST. Specifically, on FEMNIST, it meets this target in only 17 communication rounds (8.29 min), significantly outperforming competitors like Eco-FedSplit. These substantial reductions in overhead are consistently observed across the synthetic datasets. Additionally, EF-Feddr achieves a substantial reduction in communication costs without compromising performance relative to the uncompressed FedDR.

Figure 1

Six line charts illustrate the performance of four federated learning methods: FedDR, Eco-FedSplit, Eco-FedProx, and EF-Feddr on synthetic data. The top row shows metrics compared by the number of rounds: training loss, training accuracy, and test accuracy. The bottom row shows metrics compared by log communicated bits. FedDR consistently shows stable results, while EF-Feddr demonstrates rapid improvement in test accuracy and reduced training loss. The legend identifies each method's line style and color.

Figure 1. Convergence performance of different methods on the synthetic-(0, 0) dataset with Top-k and participation rate p = 0.3.

Figure 2

Six line graphs comparing the performance of four algorithms: FedDR, Eco-FedSplit, Eco-FedProx, and EF-Feddr on a dataset labeled “femnist”. The top row shows train loss, train accuracy, and test accuracy over 200 rounds, with improvements and convergence visible. The bottom row presents the same metrics as functions of communicated bits on a logarithmic scale. Each graph uses distinct markers and colors to differentiate the algorithms.

Figure 2. Convergence performance of different methods on the synthetic-(1, 1) dataset with Top-k, participation rate p = 0.3, and N = 30 total clients.

Figure 3

Six line graphs compare the performance of FedDR, Eco-FedSplit, Eco-FedProx, and EF-Fedgdr using metrics such as TrainLoss and TestAcc against Rounds and Communicated Bits. In the top row, three graphs plot TrainLoss, TrainAcc, and TestAcc against Rounds. In the bottom row, similar metrics are plotted against log of Communicated Bits. FedDR, represented by blue, shows consistently stable performance across all graphs. Eco-FedSplit, Eco-FedProx, and EF-Fedgdr exhibit varied performance during the evaluations.

Figure 3. Convergence performance of different methods on the synthetic-(1, 1) dataset with Top-k, participation rate p = 0.3, and N = 90 total clients.

Figure 4

Six graphs compare four algorithms: FedDR, Eco-FedSplit, Eco-FedProx, and EF-Feddr. The top row shows training loss, training accuracy, and test accuracy over 200 rounds. The bottom row displays the same metrics against log communicated bits. Legends identify lines: blue triangles for FedDR, orange squares for Eco-FedSplit, green triangles for Eco-FedProx, and red circles for EF-Feddr. Overall, the graphs illustrate performance trends in synthetic data labeled (1,1).

Figure 4. Convergence performance of different methods on the FEMNIST dataset with Top-k and participation rate p = 0.3.

Figure 5

Six line graphs compare three methods: FedDR, Eco-FedSplit, and EF-Feddr, for the Shakespeare dataset. The first row shows training loss, training accuracy, and test accuracy against the number of rounds. The second row presents the same metrics against logged communicated bits. FedDR generally performs well in reducing loss and increasing accuracy, EF-Feddr shows competitive performance across metrics, while Eco-FedSplit has higher initial loss but improves over time.

Figure 5. Convergence performance of different methods on the Shakespeare dataset with Top-k and participation rate p = 0.3.

Table 3

Table 3. Efficiency comparison on synthetic-(1, 1) and femnist datasets.

5.4 Effect of the relaxation parameter

Figure 6 examines the effect of the relaxation parameter λ over 200 iterations. Empirically, the best convergence is observed at λ = 0.3. Consistent with prior findings on FL adaptations of Douglas–Rachford splitting, choosing 0 <λ <1 often leads to faster convergence than the classical (unrelaxed) variant.

Figure 6

Three line graphs compare training loss, training accuracy, and test accuracy against rounds for different lambda values on the Femnist dataset. The first graph shows decreasing training loss; the second graph shows increasing training accuracy, and the third graph displays increasing test accuracy with fluctuations. The legend identifies curves by lambda values: 0.7, 1.9, 1.0, 1.4, and 0.3.

Figure 6. EF-Feddr on FEMNIST with relaxation parameter λ analysis.

6 Discussion

This study presents EF-Feddr, a communication-efficient federated learning algorithm that combines error-compensated compression with Douglas–Rachford splitting. The method's robustness is demonstrated across controlled synthetic and real-world benchmarks, yet we recognize that extreme heterogeneity, such as single-class clients, remains a challenging frontier. Furthermore, while our experiments simulate realistic constraints (partial participation, compression), fully asynchronous updates and dynamic network conditions warrant further study in real deployments.

Recent advances in behavior-based threat hunting (Bhardwaj et al., 2022), IoT firmware security assessment (Bhardwaj et al., 2023), and energy-efficient proactive fault tolerance in cloud environments (Talwar et al., 2021) provide complementary perspectives for building reliable and secure federated systems. While this study focuses on optimization efficiency under non-IID and communication constraints, these studies collectively point toward an integrated “Optimization + System + Security” paradigm for future research. Specifically, they motivate investigations into client behavior profiling for attack detection, trusted execution at the edge, and proactive fault-tolerant scheduling, all of which are essential for deploying robust and efficient federated learning in real-world, dynamic environments. Furthermore, to strengthen the generalizability of our findings, future studies will also include evaluations on a wider variety of datasets, encompassing diverse domains, scales, and heterogeneity patterns, thereby providing a more comprehensive assessment of the algorithm's practical applicability.

7 Conclusion

In this study, we introduced EF-Feddr, a communication-efficient algorithm for non-convex federated learning that leverages the Douglas–Rachford splitting method, error feedback compression, and a relaxation strategy. EF-Feddr improves communication efficiency while preserving solution accuracy. Both theoretical analysis and empirical experiments demonstrated that EF-Feddr substantially reduces the number of bits transmitted from clients to the server compared with uncompressed FedDR. In terms of solution accuracy, EF-Feddr performs comparably to the uncompressed FedDR. Building on the Douglas–Rachford envelope, we established convergence guarantees and analyzed the communication complexity of EF-Feddr under mild assumptions. Extensive experiments further confirmed that our method significantly outperforms existing state-of-the-art approaches in non-IID settings.

Data availability statement

Publicly available datasets were analyzed in this study. This data can be found here: arXiv preprint arXiv:1812.01097.

Author contributions

JX: Validation, Conceptualization, Methodology, Formal analysis, Data curation, Writing – original draft, Software. CW: Visualization, Investigation, Supervision, Resources, Funding acquisition, Project administration, Writing – review & editing.

Funding

The author(s) declared that financial support was not received for this work and/or its publication.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., et al. (2016). “$TensorFlow$: a system for $Large−Scale$ machine learning,” in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (Savannah, GA), 265–283.

Google Scholar

Alistarh, D., Grubic, D., Li, J., Tomioka, R., and Vojnovic, M. (2017). “QSGD: communication-efficient SGD via gradient quantization and encoding,” in Advances in Neural Information Processing Systems 30.

Google Scholar

Atenas, F. (2025). Understanding the Douglas-Rachford splitting method through the lenses of moreau-type envelopes. Comput. Optim. Appl. 90, 881–910. doi: 10.1007/s10589-024-00646-9

Crossref Full Text | Google Scholar

Bao, H., Chen, P., Sun, Y., and Li, Z. (2025). EFSKIP: a new error feedback with linear speedup for compressed federated learning with arbitrary data heterogeneity. Proc. AAAI Conf. Artif. Intell. 39, 15489–15497. doi: 10.1609/aaai.v39i15.33700

Crossref Full Text | Google Scholar

Bernstein, J., Wang, Y.-X., Azizzadenesheli, K., and Anandkumar, A. (2018). “SIGNSGD: compressed optimisation for non-convex problems,” in International Conference on Machine Learning (Stockholm: PMLR), 560–569.

Google Scholar

Bhardwaj, A., Kaushik, K., Alomari, A., Alsirhani, A., Alshahrani, M. M., Bharany, S., et al. (2022). BTH: behavior-based structured threat hunting framework to analyze and detect advanced adversaries. Electronics 11:2992. doi: 10.3390/electronics11192992

Crossref Full Text | Google Scholar

Bhardwaj, A., Kaushik, K., Bharany, S., and Kim, S. (2023). Forensic analysis and security assessment of IOT camera firmware for smart homes. Egypt. Inf. J. 24:100409. doi: 10.1016/j.eij.2023.100409

Crossref Full Text | Google Scholar

Caldas, S., Duddu, S. M. K., Wu, P., Li, T., Konečnỳ, J., McMahan, H. B., et al. (2018). Leaf: a benchmark for federated settings. arXiv [preprint]. arXiv:1812.01097. doi: 10.4885/arXiv.1812.01097

Crossref Full Text | Google Scholar

Ezequiel, C. E. J., Gjoreski, M., and Langheinrich, M. (2022). Federated learning for privacy-aware human mobility modeling. Front. Artif. Intell. 5:867046. doi: 10.3389/frai.2022.867046

PubMed Abstract | Crossref Full Text | Google Scholar

Godavarthi, D., Jaswanth, V., Mohanty, S., Dinesh, P., Venkata Charan Sathvik, R., Moreira, F., et al. (2025). Federated quantum-inspired anomaly detection using collaborative neural clients. Front. Artif Intell. 8:1648609. doi: 10.3389/frai.2025.1648609

PubMed Abstract | Crossref Full Text | Google Scholar

Goel, C., Anita, X., and Anbarasi, J. L. (2025). Federated knee injury diagnosis using few shot learning. Front. Artif. Intell. 8:1589358. doi: 10.3389/frai.2025.1589358

PubMed Abstract | Crossref Full Text | Google Scholar

He, S., Dong, Q.-L., Tian, H., and Li, X.-H. (2021). On the optimal relaxation parameters of Krasnosel'ski-Mann iteration. Optimization 70, 1959–1986. doi: 10.1080/02331934.2020.1767101

Crossref Full Text | Google Scholar

Islam, F., Mahmood, A., Mukhtiar, N., Wijethilake, K. E., and Sheng, Q. Z. (2024). “Fairequityfl-a fair and equitable client selection in federated learning for heterogeneous IOV networks,” in International Conference on Advanced Data Mining and Applications (Cham: Springer), 254–269. doi: 10.1007/978-981-96-0814-0_17

Crossref Full Text | Google Scholar

Jhunjhunwala, D., Sharma, P., Nagarkatti, A., and Joshi, G. (2022). “Fedvarp: tackling the variance due to partial client participation in federated learning,” in Uncertainty in Artificial Intelligence (Eindhoven: PMLR), 906–916.

Google Scholar

Kant, S., da Silva, J. M. B., Fodor, G., Göransson, B., Bengtsson, M., and Fischione, C. (2022). Federated learning using three-operator ADMM. IEEE J. Sel. Topics Signal Processing 17, 205–221. doi: 10.1109/JSTSP.2022.3221681

Crossref Full Text | Google Scholar

Karimireddy, S. P., Rebjock, Q., Stich, S., and Jaggi, M. (2019). “Error feedback fixes signsgd and other gradient compression schemes,” in International Conference on Machine Learning (Long Beach, CA: PMLR), 3252–3261.

Google Scholar

Khirirat, S., Johansson, M., and Alistarh, D. (2018). “Gradient compression for communication-limited convex optimization,” in 2018 IEEE Conference on Decision and Control (CDC) (Miami, FL: IEEE), 166–171. doi: 10.1109/CDC.2018.8619625

Crossref Full Text | Google Scholar

Khirirat, S., Magnússon, S., and Johansson, M. (2022). “Eco-fedsplit: federated learning with error-compensated compression,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Singapore: IEEE), 5952–5956. doi: 10.1109/ICASSP43922.2022.9747809

Crossref Full Text | Google Scholar

Konecný, J., McMahan, H. B., Yu, F. X., Richtárik, P., Suresh, A. T., and Bacon, D. (2016). Federated learning: strategies for improving communication efficiency. arXiv [preprint]. arXiv:1610.05492. doi: 10.48550/arXiv.1610.05492

Crossref Full Text | Google Scholar

Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V., et al. (2020). Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2, 429–450. doi: 10.48550/arXiv.1812.06127

Crossref Full Text | Google Scholar

Li, X., and Li, P. (2023). “Analysis of error feedback in federated non-convex optimization with biased compression: fast convergence and partial participation,” in International Conference on Machine Learning (Honolulu, HI: PMLR), 19638–19688.

Google Scholar

Liu, J., Xu, L., Shen, S., and Ling, Q. (2019). An accelerated variance reducing stochastic method with Douglas-Rachford splitting. Mach. Learn. 108, 859–878. doi: 10.1007/s10994-019-05785-3

Crossref Full Text | Google Scholar

Liu, Y., Zhou, Y., and Lin, R. (2024). The proximal operator of the piece-wise exponential function. IEEE Signal Process. Lett. 31, 894–898. doi: 10.1109/LSP.2024.3370493

Crossref Full Text | Google Scholar

Long, Z., Chen, Y., Dou, H., Zhang, Y., and Chen, Y. (2024). Fedsq: sparse-quantized federated learning for communication efficiency. IEEE Trans. Consum. Electron. 70, 4050–4061. doi: 10.1109/TCE.2024.3352432

Crossref Full Text | Google Scholar

Malekmohammadi, S., Shaloudegi, K., Hu, Z., and Yu, Y. (2021). An operator splitting view of federated learning. arXiv [preprint]. arXiv:2108.05974. doi: 10.48550/arXiv.2108.05974

Crossref Full Text | Google Scholar

McMahan, B., Moore, E., Ramage, D., Hampson, S., and Arcas, B. A. (2017). “Communication-efficient learning of deep networks from decentralized data,” in Artificial Intelligence and Statistics (Fort Lauderdale, FL: PMLR), 1273–1282.

Google Scholar

Mishchenko, K., Khaled, A., and Richtárik, P. (2022). “Proximal and federated random reshuffling,” in International Conference on Machine Learning (Baltimore, MA: PMLR), 15718–15749.

Google Scholar

Parikh, N.Boyd, S., et al. (2014). Proximal algorithms. Found. Trends Optim. 1, 127–239. doi: 10.1561/2400000003

Crossref Full Text | Google Scholar

Pathak, R., and Wainwright, M. J. (2020). Fedsplit: an algorithmic framework for fast federated optimization. Adv. Neural Inf. Process. Syst. 33, 7057–7066. doi: 10.48550/arXiv.2005.05238

Crossref Full Text | Google Scholar

Reisizadeh, A., Mokhtari, A., Hassani, H., Jadbabaie, A., and Pedarsani, R. (2020). “FEDPAQ: a communication-efficient federated learning method with periodic averaging and quantization,” in International Conference on Artificial Intelligence and Statistics (PMLR), 2021–2031.

Google Scholar

Richtárik, P., Sokolov, I., and Fatkhullin, I. (2021). Ef21: a new, simpler, theoretically better, and practically faster error feedback. Adv. Neural Inf. Process. Syst. 34, 4384–4396. doi: 10.48550/arXiv.2106.05203

Crossref Full Text | Google Scholar

Sahu, A., Dutta, A., Abdelmoniem, M., Banerjee, A., Canini, T., Kalnis, M., et al. (2021). Rethinking gradient sparsification as total error minimization. Adv. Neural Inf. Process. Syst. 34, 8133–8146. doi: 10.48550/arXiv.2108.00951

Crossref Full Text | Google Scholar

Saifullah, S., Mercier, D., Lucieri, A., Dengel, A., and Ahmed, S. (2024). The privacy-explainability trade-off: unraveling the impacts of differential privacy and federated learning on attribution methods. Front. Artif. Intell. 7:1236947. doi: 10.3389/frai.2024.1236947

PubMed Abstract | Crossref Full Text | Google Scholar

Seide, F., Fu, H., Droppo, J., Li, G., and Yu, D. (2014). “1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs,” in Interspeech, Vol. 2014 (Singapore), 1058–1062. doi: 10.21437/Interspeech.2014-274

Crossref Full Text | Google Scholar

Sun, W., Wang, A., Gao, Z., and Zhou, Y. (2024). “A communication-concerned federated learning framework based on clustering selection,” in International Conference on Advanced Data Mining and Applications (Cham: Springer), 285–300. doi: 10.1007/978-981-96-0814-0_19

Crossref Full Text | Google Scholar

Talwar, B., Arora, A., and Bharany, S. (2021). “An energy efficient agent aware proactive fault tolerance for preventing deterioration of virtual machines within cloud environment,” in 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) (Noida: IEEE), 1–7. doi: 10.1109/ICRITO51393.2021.9596453

Crossref Full Text | Google Scholar

Tang, Z., Wang, Y., and Chang, T.-H. (2024). z-signfedavg: a unified stochastic sign-based compression for federated learning. Proc. AAAI Conf. Artif. Intell. 38, 15301–15309. doi: 10.1609/aaai.v38i14.29454

Crossref Full Text | Google Scholar

Tran-Dinh, Q., Pham, N. H., Phan, D. T., and Nguyen, L. M. (2021). Feddr-randomized Douglas-Rachford splitting algorithms for nonconvex federated composite optimization. Adv. Neural Inf. Process. Syst. 34, 30326–30338. doi: 10.48550/arXiv.2103.0345

Crossref Full Text | Google Scholar

Valdeira, P., Xavier, J., Soares, C., and Chi, Y. (2025). Communication-efficient vertical federated learning via compressed error feedback. IEEE Trans. Signal Process. 73, 1065–1080. doi: 10.1109/TSP.2025.3540655

Crossref Full Text | Google Scholar

Wang, H., Marella, S., and Anderson, J. (2022). “FEDADMM: a federated primal-dual algorithm allowing partial participation,” in 2022 IEEE 61st Conference on Decision and Control (CDC) (Cancún: IEEE), 287–294. doi: 10.1109/CDC51059.2022.9992745

Crossref Full Text | Google Scholar

Zhou, X., Chang, L., and Cao, J. (2023). Communication-efficient nonconvex federated learning with error feedback for uplink and downlink. IEEE Trans. Neural Netw. Learn. Syst. 36, 1003–1014. doi: 10.1109/TNNLS.2023.3333804

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: communication efficiency, composite optimization, data heterogeneity, error feedback, federated learning, operator splitting

Citation: Xue J and Wang C (2026) EF-Feddr: communication-efficient federated learning with Douglas–Rachford splitting and error feedback. Front. Artif. Intell. 9:1699896. doi: 10.3389/frai.2026.1699896

Received: 05 September 2025; Accepted: 05 January 2026;
Published: 28 January 2026.

Edited by:

Haifeng Chen, NEC Laboratories America Inc, United States

Reviewed by:

Mengmeng Ren, Xidian University, China
Salil Bharany, Chitkara University, India

Copyright © 2026 Xue and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Chundong Wang, bWljaGFlbDM3NjlAMTYzLmNvbQ==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.