A Novel Pricing Algorithm Based on Reward-Punishment Mechanism for Supply and Demand Balancing

Demand response (DR) is a powerful tool to maintain the stability of the power system and maximize the profit of the electricity market, where the customers engage in the pricing scheme and adjust their electricity demand proactively based on the price. In DR programs, most existing works are based on the assumption that the prediction of the electricity demand from customers is always accurate and trustworthy, which will lead to high cost and fluctuation of the electricity market once the prediction is obeyed. In this paper, we design a reward and punishment mechanism to constrain customers’ dishonest behaviors and propose a novel pricing algorithm based on the reward and punishment mechanism to relax the assumption, which guarantees the total electricity demands of all customers are within a secure range and obtain the maximum profit of the supplier. Meanwhile, we obtain the optimal demand and provide a upper and lower bound of the proposed price for the electricity market. In addition to a single type of customer, we also consider multiple types of customers, each of whom has different characteristics to prices. Extensive simulation results are constructed to demonstrate the effectiveness of the proposed algorithm compared with other pricing algorithms. It also shows that the average electricity consumption of a whole community is mostly affected by the residents’ electricity consumption and the balance of the supply and all types of customers is achieved under the proposed pricing algorithm.


INTRODUCTION
Demand-side resource management (DSM) is a new means to explore and solve the balance between the supply and demand of electricity at the customer end (Gellings, 1985). Demand response (DR), as one of the important and powerful solutions to DSM, can achieve win-win results for both the supplier and the customers where the customers are stimulated by the price to participate in the maintenance of the electricity balance proactively and adjust its electricity consumption (Albadi and El-Saadany, 2007;Deng et al., 2017;Paterakis et al., 2017;Wang et al., 2020). Therefore, it is meaningful and promising to consider how to reasonably schedule the enthusiasm of customers to participate in the power market supply and demand balance and avoid power waste.
Numerous efforts have been devoted to design price-based demand response, which can be divided into three categories, including time-of-use pricing (TOU) (Celebi and Fuller, 2012;Wang and Li, 2015;He et al., 2018;Cui and Yang, 2019), real-time pricing (RTP) (Mohsenian-Rad and Leon-Garcia, 2010;Roozbehani et al., 2012;Tsui and Chan, 2012;Finck et al., 2020) and critical peak pricing (Herter, 2007;Herter and Wayland, 2010;Jang et al., 2015;Li et al., 2018). TOU pricing scheme, where the price is designed by the time splintered into multiple time-slots, is more widely used because it can reduce the inefficiency of the single pricing scheme while being more practical for customers than real-time pricing. In DR, the control platform needs to design reasonable and effective pricing schemes to reach certain objectives, such as profit maximization and user utility maximization. For example, under the TOU pricing scheme, Cheng et al. (2019) solved an energy cost optimization problem in two-machine Bernoulli serial line, and Zhou and Li( 2015) researched the optimization problem of residential load scheduling and achieved the minimization of electricity cost of the customers and peak-valley difference of the supplier. To minimize the mean price paid by the customers, Hung and Michailidis (2018) discussed a general stochastic modeling framework for customer's power demand based on which TOU contract characteristics can be selected.
Most existing pricing schemes are designed based on the consideration of two sides (Li et al., 2011;Paschalidis et al., 2012;Kii et al., 2014;Wang and Paranjape 2017). One is the supply side that the fluctuation of the total electricity demand is small and control platform designs the pricing scheme to achieve the maximization of the profit based on the demand prediction. The other is the demand side that the customers adjust their electricity demand flexibly to minimize their cost with the given pricing. Connecting two sides, there exists a tradeoff in achieving profit maximization and keeping the balance of the supply and the demand between the supplier and the customers. Meanwhile, it is worth noting that the accurate prediction of the electricity demand is critical to the design of the pricing scheme on the above existing pricing schemes. Once the prediction is not right or the customers obey their pre-committed electricity demand, the cost will be higher if the pricing scheme is still fixed and there exists lots of waste of the power.
Based on above problem, we consider to design a reward and punishment mechanism (RPM) to constrain the behavior of the customers such that all customers will reach their pre-set electricity demand rigorously. Then, the dynamic pricing scheme is proposed to achieve the win-win results of both the supplier and the customers. The main contribution of this paper is summarized as follows.
• We consider the problem of supply and demand balancing from the perspective of the customer demand side rather than the supply side and construct a novel and practical balancing framework where the customers can adjust their electricity demand proactively. • We design a heuristic reward and punishment mechanism (RPM) to stimulate the customers to participate in the demand response scheduling spontaneously where the phenomenon of breaking promises is avoided. Then, the pricing algorithm is proposed such that the power tension is relieved and the win-win result for both the supplier and the customers is achieved.
• Extensive simulations are conducted to demonstrate the effectiveness of the proposed algorithm. It shows that RPM can decrease the cost of suppliers and customers simultaneously than that without the involvement of customers.
The rest of the paper is organized as follows. Section 2 introduces the system model and customer demand model in smart grid. In Section 3, the heuristic pricing algorithm based on RPM is proposed. Section 4 shows the simulation results. Finally, we summarize our work in Section 5.

System Modeling
Considering the potential customers with peak clipping demand response in the community, the load reduction mode of such the customers participating in demand response scheduling is studied. There exist three characters including electricity supplier (supply side), customers (demand side), and a control platform in smart grid, as shown in Figure 1. For the supplier, it generates the electricity for the customers and makes money. For each customer, it purchases the electricity from the supplier to complete its electricity requirement of production. The control platform is an organization that designs responsible pricing and relevant measures to balance the supply and demand Roozbehani et al. (2010).
Assume that both the supplier and the customers can interact with the control platform to exchange the price and the demand information. In this paper, we divide the time of each day into N periods where N 24 for hourly based pricing. To achieve the balance of the supply and the demand, the control platform introduces the customers' demand bidding behavior, i.e., the customers send the pre-committed electricity demand to the control platform and constrain its electricity consumption to attain extra reward. At the same time, the reward and punishment mechanism is designed to normalize the behaviors of the customers such that the supply and the demand balancing can be reached and the profit can be maximized.

Customer Demand Modeling
Due to different load properties and production requirements for different typical customers, the electricity demand of customers will differ from each other. Consider that there are mainly five typical types of customers in a residential power system, which include residents, charging pile, storage battery, illumination, and elevators on the demand side. For each type of customer, we model the random demand of a single type of customer i as d i in the next period. Since the demand is affected by the price, actual load d i ∈ R + is a function of the price p k , which satisfies d i p k z i p k + e i p k , i 1, 2, /, n, where p k is the unit price of electricity in k time period with k 1, 2, /, N, z i (p k ) is the minimum electricity demand that promises the normal production process, and e i (p k ) is the flexible electricity demand including transferable load, interruptible load and adjustable load. The flexible electricity demand is sensitive to the electricity price, i.e., the electricity demand will fluctuate alertly as the price varies.
In the electricity market, the customers engage in the electricity demand response scheduling by clipping the electricity peak to maintain the supply and demand in balance, such as changing electricity usage patterns and controlling electrical equipment. Assume that the customer i sends its pre-committed electricity demand d i (k) to the control platform at k time period. Let J c i be the profit function of the customer i, which is given by where convex function g(·) is a reward and punishment function denoting the reward or cost of customers caused by the pre-setting promise and its functional form will be designed later. Herein, we use dollar value to evaluate the consumption of practical electricity demand d i (p k ).
Without the involvement of customers in the adjustment of electricity demand, d i (p k ) is fixed and the profit function is decreased with the price p k . Once the flat price p * 0 is set by the control platform from the view of supply side, the less incentive there is for customers to participate in the demand response.
On the supply side, the supplier can generate a certain amount of electricity s(k) in each time period, which is the maximum demand that the supplier can provide. Through a two-way interaction between the supplier and the customers in the electricity market, the supplier provides the agreed electricity demand d s to serve in practice, which satisfies d The profit model of the supplier, J s : R + → R + , is modeled as where p is the average electricity price in a day, d(p) is the actual total load of the customers, and h : R + → R + is a function denoting the cost caused by the deviation between planning electricity demand d s and actual load d(p) in the community. It can be known that the profit of the supplier is increasing with the payment of the customers while decreasing with the electricity waste. There are many approaches to modeling the profit of the supplier, e.g., considering the power generation cost, and the modeling form will not influence the basic design of the pricing algorithm based on the reward and punishment mechanism in this paper.

Problem Formulation
The control platform, as a not-profit character who balances the electricity market, determines how to design the reward and punishment mechanism such that the electricity demand of the customers is close to the pre-committed value and the win-win results of the supplier and the customers are achieved, which are also our objectives in this paper. We set the profit function of both the supplier and the customers as the objective function. Let E(·) +∞ −∞ f (·) (τ)τdτ denote the expectation of random variables, where f (·) is the Probability Density Function (PDF) of the random variable (·). The optimization problem in this paper is formulated as follows Note that the profit of the supplier should include the payment of all types of customers because there are mainly five typical types of customers in the community, i.e., the residential power system, including residents, charging pile, storage battery, illumination, and elevators. Furthermore, considering the pricing depends on the total guaranteed demand d s of all customers, the pricing will be affected by the individual customers. In fact, not all customer will obey their promised power consumption, which will lead to the waste of electricity and generate power fluctuations. Therefore, relevant policies are urgently needed to balance the supply and the demand even though there are customers who break their promises. Designing the reward and punishment mechanism is a good method to constrain the behavior of the customers. Then, the objective of keeping the supply and demand balance while maximizing the profit of supplier and customers can be achieved by designing the reward and punishment mechanism. Some important notions in this paper are shown in Table 1.
When the profit of the customers is maximized, the optimization problem in this paper can also be simply formulated as follows

THE DESIGN OF HEURISTIC PRICING ALGORITHM
In this section, we first analyze the weakness of the lack of customer involvement. Then, the reward and punishment mechanism is designed to encourage the customers to adjust their flexible demands to sustain the balance of the supply and demand. Finally, we propose the pricing algorithm to maximize the profit of the supplier and the customers.

High Cost Without Involvement of Customers
Without the involvement of the customers, the profit function 2) is rewritten as Then, the objective functionJ(p) without customers involvement is obtained as where z n i 1 N k 1 z i (p k ) is the minimum total electricity demand and f c p (τ) can be obtained by the historical data in electricity market. On demand side, once the pricing is fixed, f c p (τ) basically remains unchanged.
Compared with the involvement of customers, there exist two problems deserving attention from the perspective of the supply and demand balancing. One is that the profit function J c i (p) is not sensitive to the price due to the large range of demand intervals, which leads to the inaction of customers in terms of cutting electricity computation. The other is that the supplier cannot predict the total actual demand accurately owing to the randomness of flexible demand, which causes the situation that power is in short supply at peak times and wasted at valley times. No matter from the view of customers or the supplier, it is not conducive to the balance of supply and demand and will produce high costs. Let J(p) −J(p) be the profit gain with customers' involvement compared to that without customers' involvement. Then, we have Note that the difference of the profit gain with/without customers' involvement partly depends on the convex function g(·), i.e., the reward and punishment mechanism. If all customers keep their promises, the profit gain will be maximized due to the reward. Otherwise, the more customers break their promises, the lower the profit. Hence, to overcome these problems, the reward and punishment mechanism is designed in this paper to incentive the customers such that the flexible total demand is constrained by the supplier, i.e., the power demand is maintained within the range of the power supply provided by the supplier and the balance of the supply and demand is reached.

The Design of RPM With Involvement of Customers
In this part, we design the dynamic reward and punishment mechanism (RPM) by introducing convex function g(·) where the flexible demand of the customers is affected by the setting of g(·) and constrained in the allowable range provided by the supplier.
The basic idea of RPM is that the control platform will give certain rewards to those customers who keep their promise, i.e., the practical demand is up to the pre-committed demand. Otherwise, the punishment will be added to them so that the behavior of the customers will be normalized in a controllable The pre-committed demand of customer i R + The set of positive real number d s The optimal demand that the supplier wishes to serve s The maximum demand that the supplier can support J c i The profit function of customer i J s The profit function of the supplier g The reward and punishment function for customers h The cost function for the supplier E The expectation of random variables June 2021 | Volume 9 | Article 682300 range. As described earlier, the customers send the precommitted demand d i (k) to the control platform. Then, the total flexible electricity demand δ i (p k ) for customer i satisfies where d i (p k ) ∈ [z i (p k ), s(k)], i.e., the actual load for customer i is between the minimum electricity demand to meet production and the maximum one that the supplier can provide in each time slot, and δ i (p k ) is the deviation of actual electricity demand from the guaranteed electricity demand. According to the reward and punishment rule, the convex function g(·) should meet the following conditions: Therefore, the structure of RPM is established as follows where μ ∈ R + is the reward and punitive weight parameter and p * 0 is the optimal price set at the beginning. Taking the first-order derivatives of Eq. 8, it follow that If δ i (p k ) ∈ [0, p * 0 /2μ], we have g[δ i (p k )] < 0 and the customer i has the maximum reward when δ i (p k ) p * 0 /2μ. Taking the firstorder derivatives of Eq. 2 with respect to {d i (p k )} N k 1 , one infers that Making Eq. 10 equal to zero, we determine that The second-order derivative of J c i is where k ′ is also the time-slot. Since μ ∈ R + , the diagonal elements of the Hessian matrix are all negative, and the off-diagonal elements are all zero. The Hessian matrix is negative definite, meaning that {d i (p k ) * } N k 1 is the optimal electricity demand for customer i.

The Pricing Algorithm With RPM
Making full use of the knowledge including elasticity matrix of electricity price, customer psychology and principle of statistics, we can obtain an optimal pricing structure given by the certain index. Based on the previous investigation and pertinent literature, the existing pricing structure in He et al. (2018) is shown as follows where p l is a lower price set for the demand that the customers has committed to use (i.e., d s ); p m is an intermediate price set for the flexible electricity usage, which is in a given flexible interval, where the flexible ratio of the interval is defined as ρ(ρ ≥ 0); and p h is a much higher price set for the electricity usage exceeding the flexible interval. Specifically, the price of customers is always p l if the electricity demand of customers is lower than the guaranteed demand. In order to decrease the price so that the actual load increases when the electricity demand of the customers is not up to the planning value, we design a heuristic pricing algorithm based on the reward and punishment mechanism and the above pricing structure. The specific idea is as follows.
With Eq. 11, the profit of the supplier 3) is transformed as Given the optimal customer electricity demand as the function of electricity price, we obtain the different price with respect to actual load. The optimal price can be attained by optimizing the following optimization problem The constraints on prices result from the optimal electricity demand. From Eq. 11, the price is also denoted by Furthermore, the constraints on customer electricity demand and price can be written as and where

SIMULATION RESULTS
In this section, we conduct extensive simulations in MATLAB to evaluate the performance of the proposed algorithm, and compare it with the flat pricing algorithm, TOU block pricing algorithm and real-time pricing algorithm (Samadi et al., 2010).

Example With a Single Type of Customer
In this part, we consider a single type of customer, i.e., residents, where the characteristic of electricity demand for all residents is almost identical. The electricity consumption of residents is sensitive to the electricity price since they are more willing to adjust their electricity demands to attain the maximum profit and Frontiers in Energy Research | www.frontiersin.org June 2021 | Volume 9 | Article 682300 decrease the cost. We adopt hourly based pricing and divide a day into four time blocks. The time block division for TOU pricing is shown in Table 2 ( Yang et al., 2013). For different pricing algorithm, their electricity demand differs from each other. The comparison of price between flat price and TOU block price is shown in Figure 2, and the pre-committed customers' electricity demand { d i (k)} N k 1 and the actual electricity demand with flat price, TOU block price are shown in Figure 3. The parameter of cost function of the supplier is set as In real applications, the cost function is carefully estimated based on historical data. The optimal price is set as p * 0 0.110 assuming all residents obey the pre-committed electricity demand at the beginning. Based on Eq. 15, the actual price is adjusted dynamically by optimizing Eq. 15, which is influenced by the deviation of actual electricity demand from the precommitted electricity demand. Through our proposed pricing algorithm, the actual electricity demand of residents is almost close to the planning electricity demand, and the difference between actual load and pre-committed load is smaller than that with the real-time pricing, which is shown in Figure 4. Compared with the load at the flat price, TOU block price and real-time price, the proposed pricing algorithm is effective to stimulate the residents engaging in demand response to comply with the pre-committed electricity demand. The pre-committed electricity demand of all customers on average is 13.1KWh. Furthermore, we compute the total load and the profit of the supplier per household in a day, shown in Table 3. We find that the actual load is close to the pre-committed electricity demand while the profit of the supplier is maximized under the proposed pricing algorithm compared to the TOU price at which the     deviation of the load is bigger and the profit is lower. Furthermore, the profit is higher under the proposed algorithm although the total load is smaller than that with the real-time price.

Example With Multiple Types of Customers
Herein, we take the five types of customers mentioned earlier into account. Besides residents, there are mainly charging piles, storage batteries, illumination, and elevators on the demand side in a residential power system. The above four types of customers belong to the common electricity scope. However, they have different characteristics of electricity demand, which is worth further considering.

Charging Pile
For each charging pile, it takes 10 h to charge electric vehicles and consumes a maximum of 1.5 kWh. The actual electricity demand of the charging piles is sensitive to the number of rechargeable electric vehicles. Moreover, the influence of the number of electric vehicles charging on the electricity consumption of charging pile is much greater than that of the change of electricity price. From Figure 5, we can know that the change of electricity price does not affect on the change of electricity consumption of the charging pile. Considering charging is a long and necessary process for those residents who have electric vehicles, the flexibility of electricity demand is relatively low, as they have limited ability to reduce or increase their total use of electricity. But they can reschedule their use of electricity to reduce their electricity costs.

Storage Battery
Storage battery can be utilized when the electricity consumption of all types of customers is not consistent with the pre-committed electricity demand d i (k). Specifically, if the actual load is lower than the planning value, the storage battery will store the substandard amount of electricity to meet total demand so that the required electricity consumption is achieved. If the actual load is higher than the planning value, the storage battery will release a portion of the power to residents to ensure the safety of residents' electricity consumption in addition to the previously specified power consumption. From Figure 6, the change of electricity price has a certain impact on the electricity consumption of the storage battery. It is worth noting that the ability to store and release electricity is limited.

Illumination
In public areas, the demand for electricity for lighting is indispensable in the day, especially in the evening. For most communities, the electricity consumption of lighting is less during the day and more at night. Based on the property, we can adjust the electricity demand properly by limiting the number of electric lights in use. Moreover, the electricity demand is

The Elevator
As for the elevator, it consumes 3kWh of power when it stands by if no one uses the elevator during the day. Objectively speaking, the electricity consumption depends on the number and density of flow of residents. As long as residents need them, the elevator must operate, and the electricity consumption is relatively unaffected by electricity prices unless residents are aware of decreasing the load. Therefore, as we can see in Figure 8, the electricity demand of the elevator is also sensitive to the price when residents are willing to control their behavior of using an elevator.
In summary, the electricity consumption behavior of different types of customers will affect the power consumption of the whole community, but it only differs from the degree of influence. As shown in Figure 9, it is the average electricity consumption per household in a neighborhood with a flat price and a simple TOU price. We can conclude that the average electricity consumption of the whole community is mostly affected by the residents' electricity consumption, and the electricity consumption of the elevator controlled by the residents' artificial consciousness occupies second place on the total power consumption of the community. However, customers with more fixed electricity demand and little flexible power consumption, such as charging pile and lighting, have little influence on the composite power consumption characteristics of the whole community, and their influence can hardly be considered. In accordance with the storage battery whose electricity demand is influenced by the sum of power consumption of all types of customers, its flexible electricity demand can be dynamically adjusted by the process of charging and discharging and keeps within a tolerable range (i.e., the difference between the sum of total power consumption and the promised power consumption in a community).
From the above research results in a community, we know that the customers, especially for residents, will proactively adjust their actual demand such that the difference between precommitted demand and actual demand is limited in a tolerable range under the proposed pricing algorithm. When we consider the power consumption for industrial users, the proposed algorithm is also applicable since the industrial users are more sensitive to electricity prices. Thus, whether the customers in the community or in the industry, all customers will be normalized to keep their promises and the stability of power system will be achieved with small tolerable fluctuation as well as the profit maximization for the supplier.

CONCLUSION
In this paper, we relax the condition that all customers will reach their pre-committed electricity demands and design a reward and punishment mechanism to restrict the opportunistic behavior of customers. Based on the reward and punishment mechanism, a heuristic pricing algorithm is proposed, which guarantees the total electricity demand is within a secure range and the stability of the power system is achieved. With our proposed pricing algorithm, the price motivates customers to adjust their electricity demand so that the actual load is close to the pre-committed electricity demand and the profit maximization of the supplier is obtained. Meanwhile, we calculate the optimal demand for the supplier to provide and find the upper and lower bound of the proposed price so that the balance of the supply and demand is maintained. Furthermore, considering multiple types of customers, we find that the change of electricity prices does have a huge impact on the electricity demands of the residents while the electricity demand of the other four types of customers including charging pile, storage battery, illumination and  elevators are not sensitive to the prices. In the future, we will continue to explore how to adjust all types of customers dynamically to guarantee the pre-committed electricity demand so that the profit of the supplier is maximized. Moreover, the demand response model needs to be further optimized.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.