ORIGINAL RESEARCH article

Front. Commun. Netw., 30 March 2022

Sec. Smart Grid Communications

Volume 3 - 2022 | https://doi.org/10.3389/frcmn.2022.827105

Incentive-Based Delay Minimization for 6G-Enabled Wireless Federated Learning

  • Department of Electrical and Computing Engineering, Wireless Communication and Information Processing Group (WCIP), Aristotle University of Thessaloniki, Thessaloniki, Greece

Article metrics

View details

3

Citations

2,2k

Views

893

Downloads

Abstract

Federated Learning (FL) is a promising decentralized machine learning technique, which can be efficiently used to reduce the latency and deal with the data privacy in the next 6th generation (6G) of wireless networks. However, the finite computation and communication resources of the wireless devices, is a limiting factor for their very low latency requirements, while users need incentives for spending their constrained resources. In this direction, we propose an incentive mechanism for Wireless FL (WFL), which motivates users to utilize their available radio and computation resources, in order to achieve a fast global convergence of the WFL process. More specifically, we model the interaction among users and the server as a Stackelberg game, where users (followers) aim to maximize their utility/pay-off, while the server (leader) focuses on minimizing the global convergence time of the FL task. We analytically solve the Stackelberg game and derive the optimal strategies for both the server and the user set, corresponding to the Stackelberg equilibrium. Following that, we consider the presence of malicious users, who may attempt to mislead the server with false information throughout the game, aiming to further increase their utility. To alleviate this burden, we propose a deep learning-aided secure mechanism at the servers’ side, which detects malicious users and prevents them from participating into the WFL process. Simulations verify the effectiveness of the proposed method, which result in increased users’ utility and reduced global convergence time, compared with various baseline schemes. Finally, the proposed mechanism for detecting the users’ behavior seems to be very promising in increasing the security of WFL-based networks.

1 Introduction

The 6th generation (6G) of wireless networks, is envisioned to support ubiquitous artificial intelligence services and be the evolution of wireless networks from “connected things” to “intelligent things” Letaief et al. (2019). Conventional machine learning approaches are usually conducted in a centralized manner, where a central entity collects the generated data, and performs the training. However, the increasing computing capabilities of wireless devices as well as the sensitive data-privacy concerns have paved the way for a promising decentralized solution, the Federated Learning (FL) Konečnỳ et al. (2016), McMahan et al. (2017). The salient feature of FL lies in the retention of the locally generated data at the device, thus, each learner individually trains the model locally without uploading any raw data to the server. Hence, the learners collaboratively build a shared model with the aid of a server, whose role is to update and redistribute the global training parameters to the learners. In this manner, the user data privacy is preserved, while the communication traffic is reduced leading to low latency, due to the absence of raw and big volume data transfer Li et al. (2020). In accordance with the key requirements of 6G networks, it is evident that 6G could be empowered by FL for ensuring low-latency and privacy preserving intelligent services Bouzinis et al. (2022a), Bouzinis et al. (2022b).

1.1 Motivation and State-of-the-Art

In the context of wireless networks, several studies have investigated the improvement of FL, in terms of model accuracy, energy efficiency, and reduced latency. For instance, the authors in Chen et al. (2020b), Shi et al. (2020b), minimized the training loss under latency and energy requirements, by jointly optimizing the computation and radio resources, as well as the user scheduling. Moreover, in another wireless FL (WFL) setup, the users’ total energy consumption and/or latency minimization, have been investigated in Tran et al. (2019), Chen et al. (2020a), Yang et al. (2021).

Although, the aforementioned research works have contributed for the efficient deployment of FL in wireless networks, there is still an open challenge regarding users’ willingness to participate into the WFL process. In particular, the users should be motivated, in order to contribute in this process with their limited energy resources. Thus, incentive designs are requisite, in order to attract the clients to be involved in this resource-consuming procedure. The considered incentives, could be expressed as a reward provided by the task publisher. For example, in Kang et al. (2019) an incentive mechanism for reliable FL was proposed, based on the contract theory, while a reputation metric was introduced, in order to measure the data-wise reliability and trustworthiness of the clients. In similar direction, authors in Lim et al. (2020) proposed an hierarchical incentive mechanism for FL. In the first level, a contract theory approach was proposed to incentivize workers to provide high quality and quantity data, while in the second level, a coalitional game approach was adopted among the model owners. In Zhan et al. (2020), a deep-reinforcement learning incentive mechanism for FL has been constructed. More specifically, a Stackelberg game was formulated, in order to obtain the optimal pricing strategy of the task publisher and the optimal training strategies of the edge nodes, which constituted the client set. Moreover, in Sarikaya and Ercetin (2019), authors formulated a Stackelberg game among workers and the task publisher, aiming to minimize the latency of a FL communication round, while in Khan et al. (2020), a Stackelberg game was formulated for FL in edge networks. Finally, in the context of a crowdsourcing framework, authors in Pandey et al. (2020) proposed a Stackelberg game for motivating the FL users to generate high accuracy models, while the server focused on providing high global model accuracy side. However, none of the above works designed an incentive mechanism, by taking into account the minimization of the FL global convergence time, which can finally result in achieving decreased delay. It should be also highlighted, that the number of scheduled devices affects the convergence speed of the global model, and its effects have not been investigated. More specifically, the trade-off between the duration of a global WFL round and the number of total rounds until convergence, has not been well-studied in the context of incentive criteria. Finally, it should be noted that when examining clients’ strategies, all previous works did not consider the joint optimization of the communication and computation resources, which can further enhance the performance and head towards meeting the strict latency requirements of 6G networks.

Also, the modeling of the interaction among the server and the users becomes more complicated when some of the users are not legitimate, degrading the overall quality of experience. However, this issue has not been considered by the existing literature, despite the progress in the development of techniques that have the potential to mitigate similar threats, such as deep learning. Recently, the application of deep learning into wireless communications has sparked widespread interest Zappone et al. (2019), while it is expected to realize the vision of 6G, which will heavily rely on AI services. For instance, deep learning has been used for simplifying the physical layer operations, such as data detection, decoding, channel estimation, as well as for resource allocation tasks and efficient optimization Sun et al. (2017). Owing to its encouraging results, deep learning may be appropriate for ensuring an unimpeachable interaction among the users and the server, as implied by the incentive mechanism during the WFL procedure, by detecting abnormal or malicious users’ behaviors.

1.2 Contribution

Driven by the aforementioned considerations, we propose a novel incentive mechanism for WFL, by modeling the interaction among users and the server/task publisher during an WFL task, by using tools from game theory. In particular, users’ objective goal is to maximize their utility, which is subject to their individual completion time of the WFL task and the energy consumption for local training and parameter transmission, given a reward for the timely task completion and an energy cost, respectively. On the other hand, the server aims to minimize the global convergence time of the WFL process. The aforementioned interaction between the server/task publisher and the users corresponds to a Stackelberg game, where the server acts as the leader of the game and announces the delay tolerance, and the users/clients constitute the set of followers and receive their decisions based on the reward given by the server and the announced delay tolerance. The convergence time is a metric of paramount importance for providing low-latency intelligent services in 6G networks, while its minimization has not been examined in the aforementioned works. Therefore, one of the main goals of the proposed incentive mechanism is to accelerate the convergence of the WFL procedure. Moreover, the convergence time is highly related with the number of communication rounds between the users and the task publisher, which is a function of the number of participating users. Hence, during the Stackelberg game, we show that the task publisher should urge a certain number of users for participation, in order to achieve the minimum convergence speed. This fact highlights the significance of user scheduling during the game, which aims to mitigate the straggler effect, i.e., excluding clients who are responsible for the occurrence of long delays. To this end, we consider the scenario where malicious users are involved into the game, which may strive to misinform the server regarding their consumed resources and possibly benefit from this action. In order to avert such behaviors, we propose the use of a Deep Neural Network (DNN), which aims to classify the users’ identity, as honest or malicious, based on resource-related observations. Through this approach, we aim to guarantee an irreproachable interaction among the users and the server, and exclude clients with malicious actions from the FL process.

The contributions of this work can be summarized as follows:

  • 1) We construct an incentive mechanism, for motivating users to utilize their resources during a FL task, through the improvement of their utility, while the task publisher aims to minimize the global convergence time of the WFL process.

  • 2) We formulate and solve a Stackelberg game for the considered user-server interaction. In particular, we obtain the optimal strategy of the users for maximizing their utility, i.e., the optimal adjustment of both computation and communication resources. Furthermore, we obtain the task publisher’s optimal strategies for minimizing the global convergence time. This translates to the selection of the optimal delay tolerance during a WFL communication round, based on the number of scheduled users.

  • 3) During the user-server interaction via the Stackelberg game, we assume that malicious users may announce false information regarding their utilized resources, aiming to mislead the server and increase their pay-off. To tackle with this issue, we construct a security mechanism at the servers’ side, whose role is to recognize false announcements and subsequently identify malicious users. Specifically, we construct and train a DNN to accurately detect malicious users. The training of the DNN is supervised and it is based on observations regarding the users’ consumed resources.

  • 4) Simulations were conducted to evaluate the performance of the proposed approaches. The results verify the effectiveness of the solutions to the game, in comparison with various baseline schemes. Moreover, insights for the Stackelberg game and its effects to the convergence time of the FL task are provided. Furthermore, the joint optimization of radio and computation resources is shown to result in increased user utility. Finally, the considered DNN for detecting malicious users, presents a quite satisfactory classification accuracy, corroborating the effectiveness of this security mechanism.

2 System Model

2.1 WFL Model

We consider a WFL system, consisting of K clients/users indexed as and a task publisher/server. Each user k has a local dataset , where are the data samples, xn,k is the n-th input data vector of user k, while yn,k is the corresponding labeled output on the respective input sample. The total size of all training data of the users is denoted as .

The local loss function on the whole data set Dk is defined aswhere f (wk, xn,k, yn,k) is the loss function on the input-output pair {xn,k, yn,k} and captures the error of the model parameter wk for the considered input-output pair. Therefore, each user is interested in obtaining the wk which minimizes its loss function. For different FL tasks, the loss function also differs. For example, for a linear regression task the loss function is . Following that, the aim of the FL training process is to find the global model parameter w, which minimizes the loss function on the whole data set across all users, which can be written asi.e., to find .

The training process consists of

N

rounds, denoted by

i

. Thus, the

i

-th round is described below

  • 1) Firstly, the BS broadcasts the global parameter wi to all participating users during the considered round. We highlight that it is not mandatory that all users are participating into the process. Let , be a binary variable which indicates whether user k is participating, i.e., ak = 1. Furthermore, we define the set , which consists of all the scheduled users. Moreover, the cardinality of is given by .

  • 2) After receiving the global model parameter, each user , updates its local model by applying one step of the gradient descent method, towards minimizing its loss function on its whole dataset, i.e., , where η is the learning rate, and then uploads the local parameter to the server.

  • 3) After receiving all the local parameters, the server aggregates them, in order to update the global model parameter, by applying , where and represents the total size of all training data among the participating users.

The whole procedure is repeated for N rounds, until a required accuracy is achieved. During the first round, the global parameter w0 is initialized by the server. The WFL model is depicted in Figure 1. Also, Table 1 summarizes the list of notations used in this article.

FIGURE 1

TABLE 1

NotationDescriptionNotationDescription
Number of users, set of userstkTransmission time of user k
Local dataset of user kpkPower of user k
DkLocal dataset’s size of user kskSize of wk
{xn,k, yn,k}n-th input-output pair of user kEnergy consumption of user k
wkModel parameter of user kτkTask completion time of user k
fk (wk)Loss function of user kDelay tolerance of a FL round
wGlobal model parameterUkUtility function of user k
J(w)Global loss functionNNumber of FL rounds
akDecision variable of user kq1Reward for timely task completion
Set of participating usersq2Energy cost
fkCPU cycle frequency of user kLkThreshold of user k
ckCPU cycles’ number of user kQTask publishers’ total budget
Computation time of user kTconvConvergence time of the FL task

List of notations.

2.2 Computation Model

The computation resources for local model training, i.e., CPU cycle frequency, of the k-th user is denoted as fk. The number of CPU cycles for a user k to perform one sample of data in local model training is denoted by ck. Hence, the computation time dedicated for a local iteration, i.e., a step of the gradient descent method, is given aswhere Dk is the data size of the dataset . Accordingly, the energy consumption for a local iteration, can be expressed as followswhere ζ is a constant parameter related to the hardware architecture of device k.

2.3 Communication Model

By using orthogonal frequency domain multiple access (OFDMA) to transmit a model update to the server/BS, the achievable transmission rate (bit/s) of user k can be written aswhere B is the available bandwidth, N0 is the spectral power density and pk, gk denote the transmit power, and channel gain of user k, respectively. The channel gain is modeled as , where the complex random variable is the small scale fading and dk is the distance between user k and the BS. Let tk be the transmission time of k-th user, dedicated for transmitting the local training parameters to the server. To upload the training parameters wk within the time duration tk, the following condition should be satisfiedwhere sk denotes the data size of the training parameters wk. Moreover, the consumed energy for the considered transmission, is given by

Following that, the total time that a user should dedicate for both computation and communication purposes, is given as

Since the transmit power of the BS is much higher than that of the users’, we ignore the delay of the server for broadcasting the global parameter to the users. Finally, the total energy consumption of the k-th user during a communication round, for executing local computations, and uploading the local training parameters, can be written as

3 User Utility and Convergence Time of the Global Model

In the considered system, the server should wait for all users to terminate the local parameter transmission and afterwards update the global model. Thus, users who present large τk are considered as stragglers, since they will be responsible for the occurrence of large delay during a communication round. More specifically, the total delay of a round is given by . Taking this into account, the case where all users are participating in the FL process may lead to increased delay, owing this to the poor wireless conditions that a user may suffer from. Moreover, it is of paramount importance that users have incentives for being involved into the considered procedure, since their participation comes at the expense of energy consumption, while devices are energy-constrained. In the continue, we proceed to the definition of users utility and the task publisher’s objective goal, i.e., global convergence time minimization.

3.1 User Utility Function

The utility function aims to quantify the incentive of a user for being involved into the FL process, in terms of a money-based reward, and it is a tool for facilitating the economic interaction among the users and the task publisher. In order to ensure low latency, we assume that the maximum delay tolerance that server requires during a communication round is . As a result, only the users who are able to meet this delay demand should participate in the process. Driven by this consideration, we define the utility function of k-th user aswhere q1 > 0 is a constant reward given by the server to the user, for a timely task completion. Thus, q1 is the price for a unit of time, received by the users. It is obvious that smaller task completion time τk, leads to higher earned reward. Moreover, q2 > 0, denotes the cost of energy consumption and ak is a binary variable, which indicates whether user k will participate in the process or not. We assume that user k will participate, i.e., ak = 1, only if the condition Uk > 0 is satisfied, otherwise user k will decide not to be involved. In essence, the utility function consists of a term which reflects the reward for the timely task completion, i.e., , meeting the delay requirement imposed by the server, and an energy cost which is related with the resources consumption. Furthermore, it is obvious that when the condition holds, i.e., user k cannot satisfy the delay tolerance condition, the utility function will always be negative, while in this case the user will not indulge in participating and will set ak = 0. Moreover, since smaller τk leads to higher earned reward, users are motivated to compute and send the local training parameters as fast as possible. However, this will lead to higher energy consumption. It should be highlighted, that the utility function can also be negative even if the case holds, owing this to an increased energy consumption. This fact implies that user k can satisfy the delay requirement of the server, but utilizes a great amount of resources for achieving low delay. As a result, an interesting tradeoff appears between task completion latency and energy consumption, since users are interested in maximizing their utility function.

3.2 Global Convergence Time

The objective goal of the task publisher is to minimize the convergence time of the FL process, in order to extract the global training model. The convergence time can be expressed aswhere Tmax is given byand represents the total delay in a communication round, since it is determined by the slowest scheduled device, while N denotes the number of total communication rounds. As shown in Li et al. (2019), the total number of communication rounds to achieve a certain global accuracy, is on the order of , where G, Γ are parameters related with data distribution and the FL settings. Hence, according to Shi et al. (2020a), the required number of total rounds in order to achieve the convergence of the global model, can be approximated as followswhere the parameters θ and β can be determined through experiments to reflect data distribution characteristics. Moreover, this model can adopt both i.i.d. and non-i.i.d. data distributions among users Li et al. (2019). From (13) it is observed that by increasing the number of participating users, i.e., increasing , the number of total communication rounds decreases. However, increased number of scheduled users may lead to increased Tmax, since it is more likely that users who present large τk will also participate. Therefore, the global convergence time in (11) is dependent on the number of scheduled users, in a non-trivial manner.

4 Stackelberg Game Formulation and Solution

4.1 Two-Stage Game Formulation

As discussed previously, each user is interested in maximizing its own utility function by optimally adjusting the available resources i.e., CPU clock speed fk, transmission time tk, transmit power pk, and finally decide whether he will participate in the training process or not, by adjusting ak. Moreover, the value of the utility function is being affected by the delay demand that the task publisher requires. On the other hand, the task publisher is willing to minimize the total convergence time of the FL process, by adjusting the maximum delay tolerance of a communication round, which influences users decision regarding participation into the procedure. Therefore, a two-stage Stackelberg game can be applied to model the interaction among users and the server, where the users are the followers while the server is the leader. Initially, an arbitrary value of is being set and users decide their optimal strategies. Given the response of the users, the server, i.e., the leader, announces the delay tolerance , being aware of the users’ decisions. It is noted that correctly classifying a game is of paramount importance, since its classification specifies the solution concept and, thus, the best actions of the involved players. In the case of Stackelberg game, this set of actions is termed as Stackelberg Equilibrium (SE). From the practical point of view, an equilibrium is an optimal decision for a player, given that the strategy of the other player is also optimized. Thus, if the users optimize their utility functions ignoring the type and the structure of the game and deviate from the SE, then they will achieve worse pay-off. In the continue we proceed to the problem formulation regarding both the followers and the leader.

Clients’ goal is to maximize their utility function, given a value of . The problem can be formally given aswhere

C1 represents the data transmission constraint, while , are the maximum values of transmit power and CPU frequency of user k, respectively.

On the other hand, the optimization problem at the side of the task publisher for minimizing the global convergence time, can be formulated aswherewhile the left-hand-side of C1 represents the overall fee that the task publisher is paying to all participating clients and Q denotes the total budget that the task publisher posses. The server is responsible for optimally adjusting , in order to enforce a certain number of users to be scheduled for participation. It can be observed that problems and are coupled, since the delay tolerance influences the number of participating users and users’ decisions, which also impact the global convergence time in turn. The manner that affects the number of participating users, will be discussed in the subsequent subsection.

To what follows, the definitions regarding the users’ and server’s optimal actions, as well as the SE, are provided Başar and Olsder (1998), Pawlick et al. (2019). The Stackelberg games are solved backwards in time, since the followers move after observing the leader’s action. The optimal actions for each follower, i.e., the optimal values of fk, pk, tk, ak that are denoted by , respectively, to respond to the leader’s action, i.e., , are the ones that satisfy the following inequality

Based on the anticipated followers’ response, the leader chooses its optimal action T*, which satisfies

Then, by using the aforementioned definitions of the optimal actions for each player, the point is the Stackelberg equilibrium, if the following set of inequalities are satisfied

4.2 Proposed Solution of the Stackelberg Game

4.2.1 Users’ Utility Function Maximization

As stated previously, users are eager to participate only if their utility function is positive, otherwise they set ak = 0 and do not spend any of their available communication and computation resources. The optimization problem , for maximizing the utility function Uk of each user, given that ak = 1, can be formulated as follows

The optimization problem in (21) is non-convex due to the coupling of tk and pk. However it can be easily proved that , which indicates that Uk is strictly concave with respect to fk. Hence, by taking , it is straightforward to show that the optimal fk is given bywhere

It is obvious that does not depend on tk and pk. After obtaining the optimal the optimization problem can be transformed as follows

The problem is non-convex in its current formulation. However, it is easy to verify that the constraint C1 always hold with equality, since the selection of larger tk or pk will lead to the decrease of the objective function. Following that, it holdsand by substituting tk in (24), the optimization problem is equivalent to the following formulation

Although problem (26) is non-convex, it can be transformed and solved efficiently with the aid of fractional programming. By observing the objective function in (26), it can be expressed as , where Φ(pk) = skq1 + skq2pk and . Furthermore, Φ(pk) is an affine function of pk, while Ψ(pk) is a concave function of pk and it also holds that Ψ(pk) > 0, ∀pk. The case of pk = 0 is trivial and is being excluded, since it would implied that user k is not willing to participate. Thus, the considered problem can be solved via the Dinkelbach’s algorithm Dinkelbach (1967). According to Dinkelbach (1967), solving (26) is equivalent to finding the unique root of

For a given λ, the function Z (pk) ≜Φ(pk) − λΨ(pk) is convex with respect to pk, since Φ(pk) is affine and Ψ(pk) is concave. Therefore, the optimal solution that minimizes Z (pk), can be easily derived by taking , yielding

Hence, the optimal of problem (26) can be found iteratively, by updating and λ(n) in each step n, according to Algorithm 1.

Algorithm 1

After obtaining the optimal , the optimal can also be calculated by using (25). Following that, the resolution of the optimization problem in (21) has been completed, since the optimal , which maximize the utility function , have been obtained. Also, we highlight that each user independently decides the optimal variables for maximizing its utility, since this decision is not subject to the residual users decisions. Moreover, the execution of is not computationally intensive for the devices, since the Dinkelbach’s algorithm is quite efficient, converging at a superlinear rate . It is further observed that the optimal solutions are independent of the value . However, can affect the sign of the utility function and subsequently the selection of . It should be noted that the optimal solutions do not guarantee that . Thus, the maximization of the utility function, subject to , , , does not ensures that the -th user is eager to participate in the federating learning task. The condition that should be satisfied, to ensure that the -th user has positive utility function, is given as follows by manipulating (10)where and . If the condition in (29) is satisfied, i.e., , then user sets , otherwise sets . As it can be observed, is the threshold value of , which determines if user is eager to participate into the FL process. In the continue, without loss of generality we assume that . Following that, let , which yieldssince the condition will hold ∈ { + 1, … , − 1, }. As a matter of fact, acts as an adjusting factor that determines which users will be scheduled for participation.

4.2.2 Global Convergence Time Minimization

Next, we proceed to the minimization of the convergence time, which is executed by the task publisher. As discussed previously, the server is able to dynamically adjust the value of among the set of levels , in order to determine the number of participating users. Given the optimal and the server can derive , from (29). Following that, the problem can be written aswhere the constraint C1 has occur by manipulating the C1 in (16). In order to solve the problem in (31), the server will execute a search among the K possible values of from the set and finally select the value that minimizes the objective function, while the selected value should also satisfy the constraint C1. It should be again highlighted that is irrelevant of the selection, since only affects the number of scheduled users. Recall that during the selection of , the server has knowledge of users’ upcoming decision, i.e., in (30), and can exploit this information to solve (31). For example, if the optimal equals to , by using (30) we conclude towhich implies that the number of scheduled users will be Kl, while the specific user participation is described by (30).

4.2.3 Stackeleberg Equilibrium

The whole procedure of the Stackelberg’s game is summarized in Algorithm 2. More specifically, in step 1, the server initializes the delay tolerance with an arbitrary value, which is announced to the users. Following that, the users solve for the given , derive the optimal and afterwards upload the values of and to the server. In step 3 and 4, the server calculates , obtains the optimal which solves problem and announces to the users. After the announcement of , in step 5, users make a decision regarding participating or not in the training process, while the server is a-priori-aware of these impeding decisions, since is being selected with the knowledge of users’ incentive, which is the sign of their utility function. With the termination of the Stackelberg game, the server has managed to obtain the optimal delay tolerance of the FL round which minimizes the convergence time, while the users have adjusted their resources, so as to maximize their utility function.

Algorithm 2

  • 1: Server initializes with an arbitrary value.

  • 2: Each user derive by solving (21) and uploads the values of to the server.

  • 3: Server calculates , by using (29).

  • 4: Server selects that minimizes the objective in (31) and announces to the users.

  • 5: Users decide if they will participate, i.e., if , set .

Stackelberg equilibrium

5 Detecting and Preventing Malicious Users From the FL Process

5.1 Malicious Users

During the considered interaction among users and the task publisher, a question that may arise is the following: What if users were malicious and announced false values of and to the server, aiming to further improve their pay-off? Firstly, we assume that users could falsely announce their latency time, as , instead of . However, they are not really willing to deviate from the optimal strategy , since this policy would reduce their utility function. Hence, in this case, the task publisher would notice this time divergence, since it is expected that user k finishes the transmission of the local parameters within duration, while he actually finishes within . As a result, the task publisher can immediately detect an abnormal behavior and exclude those users from future participation. However, users could falsely announce instead of , since the server has no knowledge of users’ computing capabilities and subsequently their consumed energy. As a result, users can influence Lk only by accordingly adjusting , in a effort to mislead the task publisher and benefit from this action. Next, any possible benefits of the aforementioned users’ action will be discussed.

Firstly, we assume that only one user is dishonest, e.g., the m-th user. In order to improve his utility, user m may select to announce , aiming to influence the delay tolerance threshold that the server imposes, i.e., increasing the delay tolerance and subsequently increasing its pay-off. Following that, we consider the case where and , meaning that the optimal strategy of the server is to enforce Kl users to participate. In this case, if the m-th user decides to announce , will not have any impact in the outcome, since it holds , and user m will not be scheduled for participation. Next, we consider the case and . Now, if user m selects to announce and it holds , again user m can not benefit from such behavior since . This means that the m-th user will participate, as he would if he had announced the true value of , while his utility function will stay unchanged. Only if the condition is satisfied, then it yields and the selected delay tolerance of the server changes. However in this case, user m will not intend to participate, since his utility will be less than or equal to zero. As a result, in none of the aforementioned cases, could user m benefit from announcing a false value of Lm, or equivalently . However, in the later case it holds , which implies that the server will select a larger value for the delay tolerance and subsequently all the participating users will benefit from this selection, since users’ utility function is monotonically increasing with respect to . Therefore, the task publisher will end up paying additional rewards. Although, an independently acting user posses no motives for adopting the considered behavior, in a scenario where multiple dishonest users are present who probably act as a cooperating coalition, it is really hard to predict whether its secure for the task publisher to tolerate such behaviors. The reason for this is that the considered malicious users may exchange information regarding their available resources and also exchange pay-offs, e.g., splitting their pay-offs.

5.2 Deep Learning-Aided Malicious Users Detection

Driven by the aforementioned scenario, a secure mechanism which detects malicious/abnormal users’ behavior should be constructed, in order to ensure an irreproachable clients-task publisher interaction. Therefore, the servers’ aim is to recognize whether the users’ transmitted tuple , is reliable or not. Considering the underlying correlation between the task completion time τk and the consumed energy , we invoke the use of a deep neural network to identify if the declared , could be realistic. We foresee this process, as a supervised-learning task which trains the neural network, given that the labels of the training tuple , belong to the set of classes {True, False}, specifying whether the considered tuple is reliable or not, respectively. Note that we slightly abuse the notation by dropping the subscript index k for simplicity. After the completion of the training process, the neural network is expected to successfully classify the forthcoming unknown , that each user will announce to the server. Hopefully, the server will be capable of recognizing dishonest users and prevent them from future participation.

5.2.1 Deep Neural Network Structure

We consider a feed-forward DNN consisting of an input layer, an output layer and M − 1 fully connected hidden layers. All layers are indexed from 0 to M. We denote the number of nodes in the m-th layer as lm. For the hidden layers, the output of the i-th node in the m-th layer, is calculated as followswhere represents the weight that connects the j-th node of the (m − 1)-th, with the i-th node of the m-th layer, is the output of the j-th node in the (m − 1)-th layer and is the bias term of the i-th node in the m-th layer. Furthermore, ReLU(⋅) = max (⋅, 0), denotes the Rectified Linear Unit function, which is a widely used activation function for neural networks. The output of are by default the inputs of the neural network, i.e., and . Finally, for the output layer, we use the softmax activation function, which extracts the probabilities pT, pF of the input vector , to belong to the class True or False, respectively, while pT + pF = 1. Therefore, it holds and . The DNN’s structure is illustrated in Figure 2. In the continue, the data generation, training and testing stages are discussed.

FIGURE 2

5.2.1.1 Data Generation

The data are generated in the following manner. The τ* and are derived according to the proposed method, for many channel realizations, in order to generate multiple samples. In the case where the user is honest, the label of is set as y = True. In the continue, the data set for the malicious users is generated. As mentioned previously, malicious users falsely declare the consumed energy as . To model this behavior, we consider that their declared tuple is , with , where Δ is a random variable, uniformly distributed in [δ1, δ2], with δ2 > δ1 > 0 being constants. In this case, the label is set as y = False. Thus, the entire data set, , consists of samples corresponding to both honest and dishonest users. More specifically, the whole data set is described by the tuple , where y(t) ∈ {True, False} and the superscript (t) is used to denote the t-th sample. Similarly, a validation data set is constructed.

5.2.1.2 Training Stage

The entire training data set is used to optimize the weights and biases of the neural network. The loss function that we use in order to capture the error between the true label y(t) and the predicted label , is the categorical cross entropy. Moreover, the validation set is used for evaluating the neural network’s performance and accordingly adjusting the hyperparameters of the training process such as the learning rate (LR), total number of training epochs, batch size, etc.

5.2.1.3 Testing Stage

In the testing stage, a test set is generated, in a similar manner with the training stage. Following that, the test set is passed through the trained neural network and the predictions are collected. The predicted label of a test sample is given by

Finally, the classification accuracy is evaluated, i.e., the ability of the neural network to detect in which user category, honest or dishonest, do the testing samples correspond.

6 Numerical Results and Performance Evaluation

We consider K = 20 uniformly distributed users in a circular cell with radius R = 500 m, while the BS is located at the center of the cell. The wireless bandwidth is B = 150 KHz, the noise power spectral density is N0 = −174 dBm/Hz, the maximum transmit power of the users is , the maximum CPU clock speed is and the effective capacitance is ζ = 10−27. Finally we set q2 = 15. All variables retain their respective values, unless specified otherwise.

Next, we set β = 27.773 and θ = 0.941 2 Shi et al. (2020a). These values correspond to a specific data distribution when the MNIST data set is used, which is a well-known handwritten digit dataset. In particular, according to Shi et al. (2020a), the considered values of β, θ reflect an i.i.d. data distribution among users. Furthermore, we also adopt the classification of the handwritten digits as the FL task. Following that, the user training data has been set as Dk = 1.6 Mbit, the size of the training parameters is sk = 1 Mbit and ck is uniformly distributed in the interval [10, 40] cycles/bit. It is noted that in the following figures, all results are averaged over 10,000 trials, by means of Monte Carlo.

In Figure 3 the impact of the number of scheduled users on the average global convergence time is demonstrated, for various values of the task publisher’s reward q1. Furthermore, we consider that users employ the optimal proposed strategy, in order to maximize their utility. As it can be observed, neither enforcing all users nor a small portion of them to participate, will lead to the average convergence time minimization. This phenomenon can be explained as follows. By scheduling a large amount of users to participate it is more likely that the latency during a communication round will be large, owing this to the straggler effect, while the convergence time will be negatively affected. On the contrary, by urging a small number of users to be involved, the required number of communication rounds to achieve global convergence will highly increase, deteriorating the convergence time. Moreover, it can be observed that higher reward q1 leads to smaller convergence time. This is reasonable, since when the reward is higher, users are motivated to spend their available resources for a fast local training and parameter transmission to the server, leading to a decreased delay during each communication round.

FIGURE 3

Following that, in Figure 4, the average convergence time versus the reward q1 is depicted. We compare the proposed method with the following cases. Firstly, the server randomly selects the number of participating users and secondly, the server always schedules 10 users for participation. It can be seen that the proposed method outperforms the considered cases. Therefore, the importance of wisely selecting the delay tolerance , in order to enforce a certain number of users to be involved in the training process, is corroborated. In addition to this, the significance of the user scheduling, which is performed by the server through the incentive design for minimizing the convergence time, is revealed. Thus, user scheduling should be a driving factor for the incentive design during the FL process.

FIGURE 4

In the continue, Figure 5 exhibits the average utility of the users given that . We compare our proposed solution for users’ utility function maximization with the following baseline schemes. Firstly, in Scheme 1 (S1), users select the optimal CPU frequency from (22), but they set pk = pmax, i.e., select to transmit with the maximum available power. Secondly, in Scheme 2 (S2), users transmit with the optimal power as extracted by Algorithm 1, but use the whole CPU frequency for local training, i.e., . It is clearly seen that our proposed method outperforms the baseline ones, which verifies the significance of the proposed optimization regarding users’ utility function maximization. Also, this example highlights the performance gains that the joint optimization of the wireless and computation resources could offer. In this manner, users are motivated to participate into the FL process, given that the resource allocation is tactfully conducted, which leads to increased utility. Furthermore, we highlight that when the utility is equal to zero, users do not intend to participate, while to do so, they would require higher reward q1 for a timely task completion.

FIGURE 5

6.1 DNN Setup and Performance

The neural network consists of three hidden layers. The first hidden layer contains 200 neurons, while the rest two layers contain 80 neurons. The optimization algorithm we use for minimizing the loss function of the neural network is the RMSprop, which is an efficient implementation of the mini-batch stochastic gradient descent method. The decay rate has been set as 0.9. In order to select suitable values for the learning rate and the batch size, the validation set was used for the evaluation of the loss function. More specifically, in Figure 6 and Figure 7, the evolution of the loss function throughout the training epochs is demonstrated. It can been observed that when the learning rate is equal to 0.001, the smallest loss is achieved. Moreover, the batch size which achieves the smallest loss is 100, which is finally adopted. In addition, the number of training epochs has been set as 200, since at this point the loss function is relatively close to a steady level. For the training of the neural network 36,000 samples were generated, while the validation set consists of 4,000 samples. Also, we used 5,000 testing samples, in order to evaluate the accuracy of the neural network. We highlight that throughout the generation of the training, validation and testing set, the ratio of honest to dishonest users was 1:1. Moreover, throughout the generation of the training and validation set, we fix δ1 = 0.2 and δ2 = 1, which means that and subsequently . Finally, the parameter selection of the neural network is summarized in Table 2.

FIGURE 6

FIGURE 7

TABLE 2

ParameterValue
Number of neurons in the 1st layer200
Number of neurons in the 2nd layer80
Number of neurons in the 3rd layer80
Decay rate0.9
Learning rate0.001
Batch size100
Epochs200
Number of training samples36,000
Number of validation samples4,000
Number of testing samples5,000

DNN’s parameters.

In Figure 8, the testing accuracy is evaluated, after passing the testing set through the DNN. Various values of δ1 are considered, while we set δ2 = 1. This implies that the malicious users’ false declared energy consumption , is uniformly distributed in . It can been seen that as δ1 increases the accuracy also increases. An interpretation of this result comes as follow. For smaller δ1, it is more likely that the honest users’ tuple and the malicious users’ tuple will be quite similar, which can be justified by observing the distribution of . Therefore, in such case, it is harder for the DNN to correctly classify the users’ identity. However, it is evident that the neural network’s ability to identify the label of the participating users, as honest or dishonest, is quite satisfactory. Therefore, through this mechanism the task publisher can exclude users who are likely to be malicious and subsequently increase the levels of security throughout the Stackelberg game. Moreover, we compare the performance of the DNN with a Support Vector Machine (SVM) classifier. It is obvious that the DNN outperforms the SVM, in terms of classification accuracy.

FIGURE 8

7 Conclusion

In this paper, we propose a secure incentive mechanism for WFL in 6G networks. Specifically, we formulate a Stackelberg game between the clients and the server, where the clients aim to maximize their utility, while the server is focusing on minimizing the global convergence time of the FL task. The optimal solution to the game is obtained while the efficiency of the proposed solution is verified, leading to reduced latency, owing to the decreased global convergence time and increased user-utility. Moreover, we consider the presence of malicious users throughout the game, who may attempt to misinform the server regarding their utilized resources, aiming to further increase their profit. To prevent such behavior, we construct a deep neural network at the server’s side, which focuses on classifying the users’ identity, as malicious or honest. Simulation results validate the effectiveness of the proposed mechanism, as a promising solution for detecting malicious users. To this end, in order to further increase the incentive design efficiency, additional FL features may be taken into account. In this direction, an interesting future topic could be the consideration of the clients’ data quality and quantity throughout the construction of the incentive mechanism, as well as the investigation of their impact on the total convergence time of WFL.

Statements

Data availability statement

The datasets presented in this article are not readily available because they need to be converted in a presentable format. Requests to access the datasets should be directed to .

Author contributions

Conceptualization and Methodology, PB, PD, and GK. Formal Analysis, PB. Validations, PB and PD. Simulations and Visualization, PB. Writing-Review and Editing, PB, PD, and GK. Supervision, GK.

Funding

Part of the research leading to these results has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 957406. Also, part of this research has been co-financed by the European Regional Development Fund by the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call: Special Actions: Aquaculture - Industrial Materials - Open Innovation in Culture (project code: T6YBP-00134).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

  • 1

    BaşarT.OlsderG. J. (1998). Dynamic Noncooperative Game Theory. SIAM.

  • 2

    BouzinisP. S.DiamantoulakisP. D.KaragiannidisG. K. (2022a). Wireless Federated Learning (WFL) for 6G Networks⁴Part I: Research Challenges and Future Trends. IEEE Commun. Lett.26, 37. 10.1109/lcomm.2021.3121071

  • 3

    BouzinisP. S.DiamantoulakisP. D.KaragiannidisG. K. (2022b). Wireless Federated Learning (WFL) for 6G Networks-Part II: The Compute-Then-Transmit NOMA Paradigm. IEEE Commun. Lett.26, 812. 10.1109/lcomm.2021.3121067

  • 4

    ChenM.PoorH. V.SaadW.CuiS. (2020a). “Convergence Time Minimization of Federated Learning over Wireless Networks,” in ICC 2020-2020 IEEE International Conference on Communications (ICC), Dublin, June 2020 (IEEE), 16. 10.1109/icc40277.2020.9148815

  • 5

    ChenM.YangZ.SaadW.YinC.PoorH. V.CuiS. (2020b). A Joint Learning and Communications Framework for Federated Learning over Wireless Networks. IEEE Trans. Wireless Commun.20 (1), 263283. 10.1109/TWC.2020.3024629

  • 6

    DinkelbachW. (1967). On Nonlinear Fractional Programming. Manage. Sci.13, 492498. 10.1287/mnsc.13.7.492

  • 7

    KangJ.XiongZ.NiyatoD.XieS.ZhangJ. (2019). Incentive Mechanism for Reliable Federated Learning: A Joint Optimization Approach to Combining Reputation and Contract Theory. IEEE Internet Things J.6, 1070010714. 10.1109/jiot.2019.2940820

  • 8

    KhanL. U.PandeyS. R.TranN. H.SaadW.HanZ.NguyenM. N. H.et al (2020). Federated Learning for Edge Networks: Resource Optimization and Incentive Mechanism. IEEE Commun. Mag.58, 8893. 10.1109/mcom.001.1900649

  • 9

    KonečnỳJ.McMahanH. B.YuF. X.RichtárikP.SureshA. T.BaconD. (2016). Federated Learning: Strategies for Improving Communication Efficiency. arXiv. [Preprint].

  • 10

    LetaiefK. B.ChenW.ShiY.ZhangJ.ZhangY.-J. A. (2019). The Roadmap to 6g: Ai Empowered Wireless Networks. IEEE Commun. Mag.57, 8490. 10.1109/mcom.2019.1900271

  • 11

    LiX.HuangK.YangW.WangS.ZhangZ. (2019). On the Convergence of Fedavg on Non-iid Data. arXiv. [Preprint].

  • 12

    LiT.SahuA. K.TalwalkarA.SmithV. (2020). Federated Learning: Challenges, Methods, and Future Directions. IEEE Signal. Process. Mag.37, 5060. 10.1109/msp.2020.2975749

  • 13

    LimW. Y. B.XiongZ.MiaoC.NiyatoD.YangQ.LeungC.et al (2020). Hierarchical Incentive Mechanism Design for Federated Machine Learning in mobile Networks. IEEE Internet Things J.7, 95759588. 10.1109/jiot.2020.2985694

  • 14

    McMahanB.MooreE.RamageD.HampsonS.y ArcasB. A. (2017). “Communication-efficient Learning of Deep Networks from Decentralized Data,” in Artificial Intelligence and Statistics (PMLR), 12731282.

  • 15

    PandeyS. R.TranN. H.BennisM.TunY. K.ManzoorA.HongC. S. (2020). A Crowdsourcing Framework for On-Device Federated Learning. IEEE Trans. Wireless Commun.19, 32413256. 10.1109/twc.2020.2971981

  • 16

    PawlickJ.ColbertE.ZhuQ. (2019). A Game-Theoretic Taxonomy and Survey of Defensive Deception for Cybersecurity and Privacy. ACM Comput. Surv. (Csur)52, 128. 10.1145/3337772

  • 17

    SarikayaY.ErcetinO. (2019). Motivating Workers in Federated Learning: A Stackelberg Game Perspective. IEEE Networking Lett.2, 2327. 10.1109/lnet.2019.2947144

  • 18

    SchaibleS. (1976). Fractional Programming. II, on Dinkelbach's Algorithm. Manage. Sci.22, 868873. 10.1287/mnsc.22.8.868

  • 19

    ShiW.ZhouS.NiuZ. (2020a). “Device Scheduling with Fast Convergence for Wireless Federated Learning,” in ICC 2020-2020 IEEE International Conference on Communications (ICC), Dublin, June 2020 (IEEE), 16. 10.1109/icc40277.2020.9149138

  • 20

    ShiW.ZhouS.NiuZ.JiangM.GengL. (2020b). Joint Device Scheduling and Resource Allocation for Latency Constrained Wireless Federated Learning. IEEE Trans. Wireless Commun.20, 453467. 10.1109/twc.2020.3025446

  • 21

    SunH.ChenX.ShiQ.HongM.FuX.SidiropoulosN. D. (2017). “Learning to Optimize: Training Deep Neural Networks for Wireless Resource Management,” in 2017 IEEE 18th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Sapporo, July 2017 (IEEE), 16. 10.1109/spawc.2017.8227766

  • 22

    TranN. H.BaoW.ZomayaA.NguyenM. N.HongC. S. (2019). “Federated Learning over Wireless Networks: Optimization Model Design and Analysis,” in IEEE INFOCOM 2019-IEEE Conference on Computer Communications, Paris, April–May 2019 (IEEE), 13871395. 10.1109/infocom.2019.8737464

  • 23

    YangZ.ChenM.SaadW.HongC. S.Shikh-BahaeiM. (2021). Energy Efficient Federated Learning over Wireless Communication Networks. IEEE Trans. Wireless Commun.20, 19351949. 10.1109/twc.2020.3037554

  • 24

    ZapponeA.Di RenzoM.DebbahM. (2019). Wireless Networks Design in the Era of Deep Learning: Model-Based, Ai-Based, or Both?IEEE Trans. Commun.67, 73317376. 10.1109/tcomm.2019.2924010

  • 25

    ZhanY.LiP.QuZ.ZengD.GuoS. (2020). A Learning-Based Incentive Mechanism for Federated Learning. IEEE Internet Things J.7, 63606368. 10.1109/jiot.2020.2967772

Summary

Keywords

wireles federated learning, incentive mechanism, stackelberg game, convergence time minimization, deep learning

Citation

Bouzinis PS, Diamantoulakis PD and Karagiannidis GK (2022) Incentive-Based Delay Minimization for 6G-Enabled Wireless Federated Learning. Front. Comms. Net 3:827105. doi: 10.3389/frcmn.2022.827105

Received

01 December 2021

Accepted

18 February 2022

Published

30 March 2022

Volume

3 - 2022

Edited by

Adlen Ksentini, EURECOM, France

Reviewed by

Zehui Xiong, Nanyang Technological University, Singapore

Jianbo Du, Xi’an University of Posts and Telecommunications, China

Updates

Copyright

*Correspondence: George K. Karagiannidis,

This article was submitted to Smart Grid Communications, a section of the journal Frontiers in Communications and Networks

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics