Research on Game Operation of Multi-Stakeholder in Distribution Network in Electricity Market

With the development of the electricity market, various stakeholders such as batteries, multi-microgrid (MMG), and electric vehicle (EV) clusters, can trade with either the distribution network or each other to meet their power balance needs and to maximize their profits. This paper proposes a two-level game model based on game theory to study the operation strategy of stakeholders in the distribution network. First, each stakeholder predicts its electricity demand profile. A Markov Decision Process (MDP) model of random variables is established to predict the charging and discharging power of the battery. Then, the two-level game is presented to let multi-stakeholder participate, in which different kinds of stakeholders have different game strategy limits. Additionally, suggestions for battery operation modes under different compensation coefficients are given to participate in the subsequent two-level game. An algorithm is proposed to allow stakeholders to merge or split self-adaptively based on Nondominated Sorting Genetic Algorithm II (NSGA-II) to optimize operation mode. Finally, the proposed model is applied to the PG and E69-bus distribution system and a practical 101-bus distribution system in China. The case studies show that different game strategy limits of the stakeholders will affect the distribution of the Nash equilibrium (NE) solutions. The multi-stakeholder system can better absorb regional unbalanced power through electricity transactions, and further increase the benefits of each stakeholder.


INTRODUCTION
With MMG (Wu et al., 2011;Li et al., 2013), batteries (Wang et al., 2019a), and EV clusters (Zhang et al., 2017;Li et al., 2021) connected to the distribution network, the structure and operation mode of the distribution network are becoming more complex. Meanwhile, with the development of electricity market transactions, microgrids (MGs) with distributed energy resources can trade with other stakeholders, such as EVs and batteries . Through diversified market competition and cooperation, the operational benefits of various stakeholders (Rui et al., 2019) and the operation efficiency of the entire system can be improved. When the interaction among multi-stakeholder involving batteries, MMG, and EV clusters is considered, how to achieve the optimal operation of the entire system under the electricity market has become a challenging problem.
In the market-oriented transactions on the load side, game theory is a common tool to solve the optimization problem of multiple decision-making entities (Lu et al., 2014). Game theory is also used widely in MG behavior strategies and pricing. When MMG are used as dispatched units, through participation in load-side market transactions and bidding games , a model of electricity price bidding decisions is constructed. A two-stage dynamic game (Jiang et al., 2014) is adopted to achieve optimal economic dispatch of the distribution network; meanwhile, the MGs have the best operation profits. However, the possibility of direct transactions between MMG is ignored and the enthusiasm of MGs for participating in the electricity market cannot be given fully utilized. In literature (Jalali et al., 2017), a one-leader multiple-follower model between the distribution network operator and the MMG system is constructed considering transactions between MMG. A cooperative coalition of photovoltaic MG groups is formulated in (Liu et al., 2018), which does not consider the impact of the overall unbalanced power after the coalition is formed on the external distribution network, yet considered in (Lin et al., 2017). However, the participants in the game in the above studies are either MMG or MMG and distribution networks (Lin et al., 2017). When the distribution network contains multi-stakeholder such as batteries, MMG, and EV clusters, the situation that each stakeholder gives different game strategies to participate in the transaction needs further study.
When multi-stakeholder participate in game transactions, they need to predict their power curves for the next day. In existing works of literature, the integration of storage units helps to balance the system while dealing with various sources of uncertainty in the power grid (Moazzami et al., 2018). Considering the energy storage system and distributed energy resources in the user's house, literature (Paterakis et al., 2015) developed a detailed home energy management system structure and determined the demand response strategies to determine the optimal day-ahead home appliance dispatching. Energy storage is collaborated operation with renewable resources to decrease the vulnerability of the system against plausible fluctuations in generation or consumption and alleviate the generation cost at peak hours, then it increases the robustness and resiliency of the grid (Shen et al., 2015;Bitaraf and Rahman, 2018). However, the above studies all use traditional methods to model the uncertainties, which are complex and have poor convergence, and even difficult to solve due to the large state space. It is more effective to use Markov Decision Process (MDP) to model the uncertainty of the system. In (Shi et al., 2020), the operation of the battery is modeled as an MDP, and a Q deep neural network is embedded to approximate the optimal decision of battery, so as to deal with the problem of system voltage operation level caused by the high intermittent of renewable energy sources and the fluctuation of load demand. An MDP model for the real-time operation of the MG is established under uncertain conditions. Then, the best output of the battery is determined in the MG (Shuai et al., 2019;Zhu et al., 2019). The above studies all combine the operation of batteries with the variability and uncertainty of renewable energy sources and load in the MG or distribution network. One main challenge of the battery participating in electricity transactions is establishing an MDP model based on the influence of the temperature of the battery and determining its optimal charging and discharging strategy to participate in the two-level game to obtain maximum benefits.
Moreover, EVs mainly participate in demand response to profit (Lin et al., 2020) or charge and discharge intelligently based on time-of-use (TOU) electricity price (Wang et al., 2019b;Wang et al., 2019c). The battery is often used for peak shaving and valley filling as grid-side storage. EV clusters and batteries can be set as independent agents to trade energy in the market (Fang et al., 2020), but not always trade with the distribution network. When EV clusters and batteries are independent stakeholders after predicting the charging and discharging profile, the main problem is to develop a new framework that enables a number of stakeholders to individually and strategically choose the partners that they wish to trade with (Wang et al., 2014). Yet few papers have researched in-depth on this issue. This paper proposes a two-level game model of multistakeholder in the distribution network. The upper-level game is a non-cooperative game of electricity price between the distribution network and multi-stakeholder, and the lowerlevel game is a cooperative game of transaction loss cost to find the optimal coalition of multi-stakeholder. An algorithm is proposed to allow stakeholders to merge or split self-adaptively based on NSGA-II to optimize operation mode. The contributions of this paper are listed as follows: (1) An MDP model of battery is established to predict its charging and discharging power considering the influence of temperature randomness. In addition, the profitability of the battery in different compensation coefficients is analyzed, and clear guidance on the choice of battery operation mode is given.
(2) The two-level game is presented to let multi-stakeholder participate, in which different kinds of stakeholders have different game strategy limits. the convergence domain of the game system will be extended from a common twodimensional plane to a three-dimensional space. The proposed two-level game model can be used to solve higher dimensional Nash equilibrium problems. (3) MGs, batteries, and EV clusters are independent stakeholders in this paper. Case studies applied to the PG and E69-bus distribution system and a real 101-bus system demonstrate the effectiveness of the two-level game model, with cost reduction for the multi-stakeholder in the distribution system.

POWER PREDICTION OF BATTERY AND EV CLUSTER
Before participating in the two-level game, all stakeholders need to predict their power curve of the next day. The MGs contain renewable energy sources such as wind power and photovoltaic power generation. The power prediction technology of MGs is quite mature and will not be introduced in detail here. In this paper, an MDP model is established to predict battery power considering the influence of temperature randomness on battery charging efficiency. Moreover, the charging and discharging power of the electric vehicle cluster are obtained based on the time of use electricity price under the premise of meeting the travel demand of users.

Battery Model
Considering the effect of temperature on the charging efficiency during the charging process of the battery (Powell and Meisel, 2016), a random variable is introduced into the charging/ discharging process of the battery. The state of charge of the battery has Markov characteristics, in which C(S t , x t ) is the revenue of the battery at stage t as follows: where S t is the state of the battery; a dep and c dep are the depreciation factors; x t < 0 is the charging power, x t > 0 is the discharging power; x t 0 is the float power; χ t is the feasible region of x t , and T is set of dispatch periods. The objective function is subject to the constraints as follows: where P e is the maximum power of battery; SOC max and SOC min are the maximum and minimum SOC of battery, respectively; Constraint (5) describes the SOC transfer procedure of battery, g(x) (1+η)+(1−η)sgn(x) 2 x, in which η is the charging efficiency of battery.

Markov Decision Process
MDP is a sequential optimization problem whose goal is to find a policy to maximise expected profits or minimise expected costs (Zhu et al., 2019). Let S t and x t be the state variables and decision variables of the battery at stage t, respectively, W t is the exogenous information that arrives during the stage interval from t to t + 1; V(S t ) is the value function of state S t . The evolution of the battery can be described according to the state transition function: The problem is to find the strategy from t 0 to t T with the objective function: According to the optimality principle proposed by Bellman in 1957, Bellman equation can be used to reformulate and recursively solve as: In Equation 8, the original multi-stage optimisation problem can be decomposed into a serial of single-stage sub-problems and solved in turn. When solving the sub-problem in each stage, the latest exogenous information can also be considered to deal with the uncertainty of the system. The discount factor is set to be 1. MDP theory is based on the setting of sub-problems according to temporal decomposition. In the temporal decomposition framework, there are four classes of elements, namely state variables, decision variables, exogenous information, and transition function. They are defined as follows: i. State variable: The state variable is used to reflect the relationships among sub-problems in the decision-making process. The state variables of battery can be expressed as ii. Decision variable: The decision variable is the charge/ discharge power of battery: iii. Exogenous information: The exogenous information is used to represent the stochastic factor in the decision process of battery, which is given by iv. Transition function: The transition function is used to map the current state to the next state according to the decision and the exogenous information. The transition function between period t and period t − 1 can be described as SOC transfer function of battery in (5).

Solution of MDP Model
Because the MDP problem in this paper contains the calculation of expected value, it will often lead to difficulty in solving. Then the problem of curse of dimensionality is brought about. ADP uses the method of iteratively updating the approximate value function to find a strategy to make the value function approximately optimal. In the existing works, there are several methods to update the approximate value function. In this paper, the method proposed in (Zhu et al., 2019) is used to update the slopes of the value functions.

Power Prediction of EV Cluster
SOC 1 is the remaining energy when EV returns to the residential area. SOC 2 is the critical remaining energy of the EV: where L denotes the daily mileage, W 100 is the power consumption per 100km, P c is the charging power of the EV, and P 0 is the rated capacity of residential EVs. T val is the length of the valley period. The charging and discharging strategy of EVs is determined by the following rules according to the specific time when the EV finally returns to the residential area: For EVs Who Return During Peak Period i. If SOC 1 > SOC 2 , the current strategy can be summarized as: first, the EVs connect to the charging pile to discharge to the grid at the peak period. Then the EVs will charge when the valley period comes. ii. If SOC 1 ≤ SOC 2 , the EVs charge in part of the peak period and the whole valley period to ensure that they have enough remaining energy for next trip.

For EVs Who Return During Flat Period
i. If SOC 1 > SOC 2 and the next period is peak period. The EVs keep discharging during the peak period. The EVs will change to charge when the valley period comes. ii. If SOC 1 > SOC 2 and the next period is valley period. The EVs keep discharging during the flat period. The EVs will change to charge when the valley period comes. iii. If SOC 1 ≤ SOC 2 and the next period is peak period. first, the EVs select part of the flat period and the whole valley period to charge. Then if the remaining energy SOC 3 is less than SOC 2 at the end of the flat period, the EVs will select part of the peak period and the whole valley period to charge. iv. If SOC 1 ≤ SOC 2 and the next period is valley period. The EVs select part of the flat period and the whole valley period to charge.

For EVs Who Return During Valley Period
For EVs who return during valley period, they will charge immediately.

Non-cooperative Model of the Upper Game
In this paper, a distribution system with stakeholders such as MMG, batteries, and EV clusters is considered. Batteries and EV clusters are energy storage systems (ESS). According to the unbalanced power, an MG with power surplus is called "the seller MG" while one with power shortage is called "the buyer MG". An ESS is called "the seller ES"' when discharging while one is called "the buyer ESS" when charging. An ESS without charging or discharging is called "the balanced ESS". The open electricity market is considered, and power is allowed to be transferred between the multi-stakeholder as well as between the stakeholder and the distribution system ({DS} In other words, a necessary and sufficient condition of the game is that: The Participants.
The participants in the game are the seller MGs, the seller ESSs and the {DS}.

The Gains
The gains of the seller MGs, the seller ESSs, and the {DS} have different compositions, described as follows: a. The gain of the seller MG includes transaction income, service charge, generation cost, and subsidy income for renewable energy power generation. The gain of the {DS} is composed of transaction income (part of the transaction with the MG), service income, and balance power income (part of the transaction with the main network) (Lin et al., 2017). b. The gain of the seller battery is composed of transaction income, service charge, and depreciation cost.
1. Transaction income: u b tra is the transaction income of the seller battery, consisting of income from the buyers and the {DS} as well.
where P tra b k and P rD b are the positive active power transferred from the seller battery to the buyer k and the {DS}, respectively; d b is the purchase price for the {DS}.
2 Service charge: The services charge u b ser paid by the seller battery is given as follows: where s D is the unit price charged by the {DS}. 3 depreciation cost: the depreciation loss during battery operation is considered as follows: where a dep and c dep are depreciation coefficients; P bat is the discharging power of the battery. In conclusion, the u b of the seller battery is formulated as c. The gain of the seller EV cluster is also a combination of transaction income, service charge, and operation and maintenance cost.
where P tra ev k and P rD ev are the positive active power transferred from the seller EV cluster to the buyer k and the {DS}, respectively.
2. Service charge: The services charge u ev ser paid by the seller battery is given as follows: 3. Operation and maintenance cost: the cost of operating and maintaining EV clusters such as charging piles and other equipment is considered as follows: where c ev denotes the average unit operation and maintenance cost; P ev is the discharging power of the EV cluster.
In conclusion, the u ev of the seller battery is formulated as u ev u ev tra + u ev ser + u ev OM The game structure is shown in Figure 1.

Electricity Trading Rules
According to the power curves predicted by all stakeholders, the transactions in the system are studied. The transaction rules are formulated as follows: a. The seller MGs and the seller ESSs fix the price for selling electricity sale price, respectively. Additionally, the {DS} fixes the service charge. b. With the aim of minimal cost, the buyer decides to trade with either the seller MGs, the seller ESSs or the {DS} in sequence until its demand is fulfilled. c. First, in accordance with the principle of prioritizing the consumption of renewable energy sources, the buyers will purchase electricity from MGs, EV clusters, and finally batteries. Buyers with the largest power shortage are satisfied first. d. When several sellers offer the same electricity price, the buyers trade with first the MGs, then the EV clusters, and finally the batteries. The seller with more power surpluses has a priority to trade with buyers.

Lower-Level of the Game Model
In the upper game, most of the NE solutions have their corresponding operation mode. However, there is a situation where participants have the same gain when they operate in the coalition mode and the non-cooperation mode. This is called 'the selection mode', then it turns to the lower level of the game which is a cooperative game of transaction loss cost. The final operation mode is determined by comparing the transaction loss of the stakeholders in different modes (Lin et al., 2017).

Stakeholders Operate in the Non-cooperative Mode
The non-cooperative model is an operation situation in which all stakeholders trade with the {DS}, but do not trade with each other.
In the non-cooperation mode, active power P i is transferred at a medium voltage U 0 between the stakeholder i and the {DS}. Stakeholder i is seller when P i > 0, and is buyer when P i < 0. Accompanied with the transfer of power, power loss incurs due to transmission lines and transformers expressed as where R i0 is the resistance of the line between the stakeholderi and the {DS}; β is the coefficient of power loss in transformers; f (P i )is the power exchanged between the stakeholder i and the {DS}, which is defined as: where W p i is the power offered by the {DS}. In order to ensure that the stakeholder i acquire the necessary power P required i ( − P i ), W p i is given by a solution to Eq. 23 We define the power loss cost as the U i of the system model in the non-cooperative mode where d x denotes the cost for unit power loss. d x d b if the stakeholder i is a seller; d x d s if the stakeholder i is a buyer.

Stakeholders Operate in the Coalition Mode
Multi-stakeholder and the {DS} do not have direct power transactions. All stakeholders cooperate with each other to complete power transmission and form a coalition. With the coalition as a unit to conduct power transactions with the {DS}, the stakeholders within the coalition can exchange energy. At the same time, power exchange between stakeholders can avoid losses on the transformer.
In order to study further the possibility of forming a coalition and the assignment of the coalition's cost, we define the cooperative multi-stakeholder coalition game as a pair (N, v). Ndenotes the set of participants and v is a characteristic function corresponding with every coalition S, v(S) denotes the cost of the coalition S achieved by the cooperation of stakeholder in it, defined as follows: where S r and S s are the set of sellers and buyers in the coalition S, respectively. dist ij is the distance between the seller i and the buyer j , and D is the distance threshold.
The constraints indicate that there are a seller and a buyer in a coalition at least, and two stakeholders are possible to cooperate only if they locate nearer than the distance threshold. The system should also satisfy the line capacity constraints, voltage constraints, and power flow equation constraints. The power loss P loss ij is formulated as where P ij is the power transferred from the seller i to the buyer j, and R ij is the resistance of transmission line. The impedance loss of the transmission line and transformer loss P loss i0 will exist when the stakeholder i trades with the {DS}. It is formulated as follows, where Z T is the impedance of the transformer, and n is the transformer ratio.
Therefore, we define the objective function of the lower-level of the model as follows: The objective function represents the smallest cost of the coalition S, the least power loss cost yielded in the power transfer namely. The cost of the stakeholder i in a coalition S is defined as given by where α i 1 and α i /α j ψ(i)/ψ(j) in the coalition S.

Game Algorithm of Optimal Multi-Stakeholder Coalition Based on NSGA-II
After predicting the power curve through MDP, then the battery participates in the game as an independent stakeholder. In order not to omit any NE solution, the upper-level game finds the NE points by traversing the strategy combination method, the lowerlevel game model based on the NSGA-II allows stakeholders to merge or split self-adaptively. The overall flowchart of the algorithm is shown in Figure 2.

Existence of the NE Solutions
The NE solutions are the key to the upper-level game. All the strategy combinations of participants will be enumerated so that no NE solutions will be omitted. Therefore, the strategies need to be discretised. The discrete steps are Δc MG i , Δc ESS j , and Δs D , respectively. All the discrete steps are constant. The strategies are expressed as follows: The upper and lower limits of the strategy set have been defined above. Therefore, the discrete strategy combinations are finite. Moreover, the number of participants is also finite in this game. According to Theorem 1, the NE's existence is ensured.
Theorem 1: (Nash 1950): In the n-player normal-form game G (S 1 , S 2 , ..., S n ; U 1 , U 2 , ...U n ), if n is finite and S i is finite for every i then there exists at least one NE, possibly involving mixed strategies.

Optimal Coalition Algorithm Based on NSGA-II
NSGA-II is a representative multi-objective optimization algorithm and similar to the basic genetic algorithm. Its core idea is: to sort the population non-dominantly, and then obtain the virtual crowded distance of each individual in each level. On this basis, the operations of selection, crossover, mutation, and elite retention are completed. NSGA-II uses Pareto sorting to find the minimum electricity loss value of each stakeholder without increasing the power consumption of other stakeholders, such a coalition is called the optimal stakeholder coalition.
Assume that (N r + N s ) stakeholders disjoint two coalitions C {C 1 , . . . , C c } and K {K 1 , . . . , K k }, and the elements in the coalition do not intersect each other. For the coalition C {C 1 , . . . , C c }, the U j of the stakeholder j in the coalition C i is ψ j (C i ) ψ j (C), where ψ j (C)is the cost of stakeholder j which is expressed in (Lin et al., 2017). The coalition C is better than the coalition K when the following formula is fulfilled in C < K5{ψ j (C) ≤ ψ j (K), ∀j ∈ C, K}, and at least one of the participants j meets ψ j (C) < ψ j (K). In other words, from the coalition C to coalition K, at least one stakeholder can obtain more gain from the coalition C without harming the gain of other stakeholders. In the optimal coalition, no stakeholder can increase its own profits without harming other stakeholders.
Merge rule: Merge any collections of

CASE STUDIES
The MDP model of power prediction of the battery is a non-linear programming (NLP) problem, which is solved by BARON solver in GAMS. Several case studies of two-level game model for multistakeholder transactions are carried out by using MATLAB.

Simulation Parameters
In this section, simulations are performed to validate the proposed model. We test the proposed model on the PG and E69-bus distribution system as illustrated in Figure 3. There are 6 MGs connected randomly to nodes 30, 41, 15, 21, 53, and 69, respectively. There is an EV cluster (EV1) and a battery (battery1) at nodes 65 and 58, respectively. The EV1 has 100 EVs with a capacity of 30 kWh and rated power of 5 kW. The battery1 has a capacity of 1,000 kWh and a rated power of 200 kW. All stakeholders are interconnected through Interconnect Static Switch (ISS). The ISS is disconnected when two stakeholders are in the non-cooperative mode, and the ISS is closed when they operate in the coalition mode. The TOU electricity price of the {DS} is shown in Table 1. The values of the other parameters are shown in Table 2. The predicted power curves of the MGs and EV1 are demonstrated in Figure 4.

Analysis of the Profits of the Battery in Different Modes
The battery is connected to the {DS} as an independent stakeholder. Figure 5A shows two operation strategies of battery. One is "grid-side energy storage mode" that the battery operates to minimize the standard deviation of the load profile in the {DS}. The other is "arbitrage mode" that the battery operates to maximize its own profits according to the TOU electricity price. The battery predicts the power curves of the two operation modes to participate in the subsequent twolevel game. Ultimately, the operation mode of the battery depends on its corresponding benefit. In order to explore the influence of compensation coefficient on the benefit of the operation modes, the benefits of grid-side energy storage mode and arbitrage mode are compared in Figure 5B under different compensation coefficients. As the compensation coefficient c sub increases, the benefits of the battery operated in the grid-side energy storage mode changes from negative to positive and continues to increase, when the battery is operated in the gridside energy storage mode, where the maximum value of c sub max (d )/2 is 0.36. When c sub ∈ [0, 0.21 ], the battery operates in arbitrage mode as the benefit of the grid-side energy storage mode is less than the benefit of the arbitrage mode. When c sub ∈ [0.22, 0.36 ], the battery operates in the grid-side energy storage mode as the benefit of the grid-side energy storage mode is more than the benefit of the arbitrage mode. If the peak-to-valley difference exceeds the threshold, the {DS} would like to smooth the load curve, so the battery will operate in the grid-side energy storage mode by giving compensation. Otherwise, the battery will always operate in the arbitrage mode.

Results of battery predicted power.
The problem of maximizing the revenue of the battery is an NLP problem. The charging efficiency is a random variable affected by temperature and distributed in [0.9,0.95], and the discharge efficiency is assumed to be 1.
In the deterministic case, as shown in Figure 6A, the battery's benefit and power curve obtained by the ADP algorithm are exactly the same as those obtained by the centralized algorithm Frontiers in Energy Research | www.frontiersin.org September 2021 | Volume 9 | Article 744391 (Wang et al., 2019b). The benefits are both 213.495 CNY. The results show that it is feasible to solve the MDP model in this paper by using the ADP algorithm.
In the random case, the influence of temperature on charging efficiency is considered. The proposed algorithm is used to solve random problems in multiple scenarios. The power prediction curve of the battery is shown in Figure 6B. The result shows that the MDP model and the ADP algorithm can effectively predict the charging and discharging power of the battery as an independent stakeholder in the random scenario. The result will provide strong guidance for the battery to participate in subsequent game electricity transactions.

Results Analysis of Upper-Level Game
In this paper, the two-level game works when the unbalanced power of each stakeholder is different in each time period. The subsequent game results are analyzed based on the results of t 21, as the analysis of other periods is the same.
The three-dimensional distribution of NE solutions is shown in Figure 7. Most NE solutions are corresponding to noncooperative models. There is no transaction between stakeholders, that is, the sellers sell power to the {DS} while the buyers buy power from the {DS}. The game strategies and gains will not be influenced. However, the NE point     Service charge s D describes the price that needs to be paid for transactions between stakeholders. Figure 8 shows that the threedimensional distribution of the NE solution of the multistakeholder game when the {DS} service charge upper limit s D max continues to increase but other conditions remain unchanged. It can be seen that when s D max > (d S − d b )/2, except for M 0 corresponding to the selection mode, other NE points correspond to the situation that all stakeholders will trade directly with the {DS}; when s D max ≤ (d S − d b )/2, the only NE point M 0 is corresponding to the selection mode. In other words, when the service charge is high, the stakeholders tend to directly trade with the {DS}; on the contrary, when the service charge is low, the stakeholders tend to form coalitions to obtain more benefits.
The impact of c ev on the system operation mode is analyzed. The three-dimensional distribution of NE solutions will not  change as c ev varies as shown in Figure 9. All the projections of the NE solutions where the operation mode is uncertain are remained to be M 0 . The change of c ev does not affect anything except the gain of the seller EV clusters, thus will not influence the operation mode of the system. Furthermore, the operation mode of the multi-stakeholder system will not be changed when the subsidy unit price g sub , the average wind/solar power generation cost of the MG g MG , the unit price of the {DS} from the main grid g D , the battery depreciation coefficients a dep , c dep , or other parameters changes. The reasons are the same, so we will not repeat them here.

Results Analysis of Lower-Level Game
In case 1, there are only MGs in the {DS}. As shown in Figure 10A, simulation results demonstrate the effectiveness of reducing the cost of power loss for any MGs joining in a coalition. Meanwhile, MG1 and MG6 form a coalition while the other MGs trade with the {DS} in the non-cooperative mode.
Based on case 1, the independent stakeholders EV1 and battery1 connect to the {DS} and participate in power transactions in case 2. The costs of power loss of the stakeholders are shown in Figure 10B. In this case, MG2 and MG5 both trade with the {DS}; MG3 and MG6 form coalition 1 (S 1 {MG3, MG6}); MG4 and EV1 form coalition 2 (S 2 {MG4, EV1}); MG1 and battery1 form coalition 3 (S 3 {MG1, battery1}). No matter in case 1 or case 2, the costs of power loss of stakeholders forming coalition is always less than that in the non-cooperative mode. Therefore, the stakeholders choose adaptively to cooperate with others or trade directly with the {DS} when the electricity price and the service charge The comparison of the total daily operation gains of each stakeholder in different modes is shown in Table 3. In case 1, the results show that the MGs operating in the coalition mode have greatly reduced power loss cost compared with the noncooperative mode. Thus, the total daily operation costs of the buyers will be reduced while the total daily operation gains of the sellers will be increased. In case 2, after EV1 and battery1 connecting to the {DS}, in the non-cooperative mode, the total daily operation gains of EV1 and battery1 are 460.1946 CNY and 208.9388 CNY, respectively. In the coalition mode, the total daily operation gains of EV1 and battery1 are 465.0688 CNY and 210.3351 CNY, respectively. The results indicate that the EV cluster and battery as independent stakeholders participate in power transactions to choose their own trading partners adaptively, then their gains will be increased. At the same time, due to the increase in the types and number of stakeholders, the composition of coalitions has also changed. In the meanwhile, the daily operation gains of MGs have also increased in case 2 compared with case 1.
In order to investigate the efficiency of the proposed two-level game in the real system, a real 101-bus system located in the southeast of China is employed. The distribution of multi-stakeholder in the real 101-bus system is shown in Figure 11. The results of the cost  comparison of stakeholders between non-cooperation mode and coalition mode in the real 101-bus system are shown in Figure 12.
The results of coalition and cost reduction percentage are shown in Table 4. Coalition 1 is composed of MG4 and MG7. Coalition 2 is composed of MG5 and EV4. There is only EV3 in coalition 3 and EV3 trades directly with the {DS} in the non-cooperative mode. Coalition 4 is composed of MG3, MG6, battery2, and battery4. Coalition 5 is composed of MG2, MG9, MG10, and battery3. Coalition 6 is composed of MG1, MG8, and battery1. Coalition 7 is composed of EV1 and EV2. In addition to EV3 directly trading with the distribution network, other stakeholders form coalitions to trade with each other, then the cost of the transaction loss is greatly reduced. This indicates that the proposed two-level game model is suitable for multi-stakeholder transactions in a real system, and can well reduce transaction loss costs.

CONCLUSION
In this paper, a two-level game operation model of the multistakeholder distribution system including batteries, EV clusters, and MMGs is established. An effective algorithm is proposed to allow stakeholders to merge or split self-adaptively based on NSGA-II to solve the game model to optimize operation mode. Then, the transaction objects of each stakeholder will be decided. In the meantime, the best operation of the entire system will be achieved. The simulation results which are studied in the PG and E69-bus distribution system and a real system verify the accuracy of the proposed model. A useful reference for multi-stakeholder participating in the electricity market is provided in this paper.
(1) An MDP model established by considering the influence of temperature on the charging efficiency can effectively predict the power curve of the battery to participate in the twolevel game. (2) The gain of the battery highly depends on the choice of operation mode. The profitability of the battery in different compensation coefficients is analyzed and clear guidance on the choice of battery operation mode are given.
(3) Multi-stakeholders are considered simultaneously in the game relationships, so the distribution of NE points of the upper-level game is extended from the two-dimensional plane to the three-dimensional space. Meanwhile, the lower-level game can also find the optimal operation mode of multi-stakeholder. The proposed two-level game model can solve the multi-dimensional Nash equilibrium problem. (4) The two-level game is verified to be effective in the PG&E69bus distribution system and a real 101-bus system. Through batteries and EV clusters participating in the two-level game model as independent stakeholders, then all stakeholders will operate in the optimal modes to reduce the costs of transaction loss.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.