Real-Time Dispatching Performance Improvement of Multiple Multi-Energy Supply Microgrids Using Neural Network Based Approximate Dynamic Programming

In the multi-energy supply microgrid, different types of energy can be scheduled from a “global” view, which can improve the energy utilization efficiency. In addition, hydrogen storage system performs as the long-term storage is considered, which can promote more renewable energy installed in the local consumer side. However, when there are large numbers of grid-connected multi-energy microgrids, the scheduling of these multiple microgrids in real-time is a problem. Because different types of devices, three types of energy, and three types of utility grid networks are considered, which make the dispatching problem difficult. In this paper, a two-stage coordinated algorithm is adopted to operate the microgrids: day-ahead scheduling and real-time dispatching. In order to reduce the time taken to solve the scheduling problem, and improve the scheduling performance, approximate dynamic programming (ADP) is used in real-time operation. Different types of value function approximations (VFA), i.e., linear function, nonlinear function, and neural network are compared to study about the influence of the VFA on the decision results. Offline and online processes are developed to study the impact of the historical data on the regression of VFA. The results show that the neural network based ADP one-step decision algorithm has almost the same performance as the Global optimization algorithm, and the highest performance among all others Local optimization algorithms. The total operation cost relative error is less than 3%, while the running time is only 31% of the Global algorithm. In the neural network based ADP, the key technology is continuously updating the training dataset online, and adopting an appropriate neural network structure, which can at last improve the scheduling performance.


INTRODUCTION
Hydrogen storage based multi-energy supply microgrids are expected to play an important role in future smart cities (Mancarella, 2014;Li et al., 2017b). In a multi-energy supply microgrid, several load demands are covered, such as electricity/heat/gas. At the same time, a hydrogen storage system can be used to alleviate the intermittence of renewable energy. For the hydrogen storage system, when the renewable energy is redundant, surplus energy is converted to hydrogen (H 2 ) through an electrolyzer; and when the energy is insufficient, a fuel cell is used to generate power based on hydrogen (H 2 ). The structure of the multi-energy supply microgrid used in this work is shown in Figure 1. Based on this hybrid microgrid, different types of energy can be utilized from a "global" view, which can improve the energy utilization efficiency (Li et al., 2018b).
On the other hand, multi-energy supply microgrids can also interconnect with different utility grids (electricity/heat/gas) (Li et al., 2018b). The structure of utility grids is shown in Figure 2. The left network represents the electricity supply system, the middle network is the gas supply system, and the right network is the heat supply system. With this integrated utility grid networks, local loads can better resist to the natural disasters (Wang et al., 2016). For example, if the electric utility grid is destroyed under natural disasters, the gas utility grid system can supply gas to a fuel cell to produce electricity. Then the local loads can still operate.
However, operating these multi-energy supply grid-connected microgrids in real-time is still a problem. Because different types of devices, three types of energy, and three types of utility grid networks are considered, which make the dispatching problem difficult.
In fact, the microgrid operation problem is often formulated as a model predictive control (MPC) problem, because MPC is widely accepted in varieties of industrial scenarios, and its effective ability to deal with optimization problems subject to large numbers of constraints (Shang and You, 2019). In fact, several methods can be adopted to solve the MPC problem.
The first category is heuristic algorithms, such as GA (Li et al., 2017a), PSO (Mohammadi-Ivatloo et al., 2013), etc. which are largely employed to solve the microgrid operation problem. This is due to their flexibility and the possibility to face complex constraints. However, heuristic algorithms do not guarantee obtaining an optimal results, because the solution is updated based on stochastic searching.
The second category is mixed integer programming (MIP). This is due to the availability of efficient commercial software, such as CPLEX and Gurobi (Gurobi, 2018). For example, in (Li and Xu, 2019), authors study the operation of a multi-energy microgrid under diverse uncertainties. The problem is represented as a two-stage operation problem. And at last is converted to a mixed-integer linear programming (MILP) problem. In (Li et al., 2021), authors study the optimal deployment of energy storage in a residential multi-energy microgrid. Based on the linearisation method, the model is converted to a MILP problem. However, in the MIP problem, the number of optimization windows is an important parameter. When the number of optimization windows is large, the solving time is long, because the variables needed to decide are large. When the number of optimization windows is small, the variables needed to decide are small, the solving time is then short, but the results are far away from the global optimal points, because more future impacts are not considered. So, the trade-off between window numbers and solving time should be considered.
The third category is dynamic programming (Xie et al., 2017), which transfers the long time horizon MPC problem into a series of smaller problems that can be easily solved. But dynamic programming suffers from the "curse-of-dimensionality" (Shi et al., 2017), which makes it difficult to use in real-time operation of large systems.
Then, a method is required which can efficiently and quickly solve the optimization problem in real-time, where the results are not far away from the global optimal points.
Approximate dynamic programming (ADP) method can resolve this problem. ADP method is a one-step decision model, and the future influence is considered as a value function approximation (VFA) in the current decision. This means that if we can find a good VFA, we can then quantify the future influence well, which leads to a reasonable decision at the current time. Since ADP is just one-step, the problem-solving time is faster than the multiple windows MPC method. Therefore, in this paper, we adopt the ADP method to control the optimal operation of grid-connected microgrids. We focus on the performance of the ADP method and compare different factors, such as regression methods, offline/online process, and so on.

Scheduling Problem Based on Approximate Dynamic Programming
For the ADP method, the main thing is the value function approximation. In general, there are three methods to describe the value function approximation (Salas and Powell, 2013;Li and Jayaweera, 2015): lookup table, parametric approximation and nonparametric approximation. For example, in (Das and Ni, 2018), authors research about the battery storage systems operation in islanded microgrid considering battery lifetime characteristics, and the approximate value function is formulated based on lookup table idea. In (Li and Jayaweera, 2015), the authors use Q-learning method to define the approximate value function. In (Keerthisinghe et al., 2018), the piecewise linear function is used to build the approximate value function. In , deep recurrent neural network learning is adopted to describe the approximate value function. The reference papers showed that ADP has better performance and lower computational burden.
Using the ADP method to optimal control the operation of microgrids has also attracted lots of attention.

Lookup Table and Parametric Approximate Value Function
In (Keerthisinghe et al., 2018), the authors present an ADP-based smart home energy management system. Lookup tables and piecewise linear functions are used to define approximate value function, the results show that the ADP-based algorithm reduces the daily electricity cost without an increase in the computational burden. In Salas and Powell (2013), authors present an ADP method to control the operation of the energy storage systems to achieve an economical goal. Piecewise linear function is adopted to define approximate value function. In (Jiang et al., 2014), the authors compare different ADP methods for energy storage control problem, including approximate policy iteration and approximate value iteration. In (Anderson et al., 2011), the authors apply ADP to the smart grid dispatching problem. The long-time horizon scheduling problem is transferred into a series of smaller problems, which is easier to be solved.
Authors in (Strelec and Berka, 2013), present the ADP method to solve multi-energy supply microgrid economic dispatching problems, lookup table and regression methods are used to approximate the cost function. In (Shuai et al., 2018b), the authors propose the lookup table based ADP algorithm for the real-time energy management of the microgrid under uncertainties. The dispatching problem is formulated as a long-time horizon mixed integer nonlinear programming model and is then decomposed into several single period nonlinear programming sub-problems based on ADP method. Similarly, in (Shuai et al., 2018a), a piecewise linear function based ADP algorithm is adopted to solve the stochastic microgrid economic dispatching problem. Authors in (Darivianakis et al., 2017), transfer the MPC optimal problem into VFA based multistage optimization problem, a piecewise linear function is adopted to approximate the value function. Authors in (Bhattacharya et al., 2018) present a two-stage dual dynamic programming method to manage energy storage in a microgrid, a piecewise linear function is also adopted to approximate cost-togo functions.

Nonparametric Approximate Value Function
In (Ji et al., 2018), authors research about real-time economical operation of a grid-connected microgrid using the ADP method. Multilayer perceptron feedforward neural network is adopted to approximate value function. In , the authors study the economical operation of a microgrid in real-time. ADP and deep recurrent neural network (RNN) learning are adopted to solve the problem. Deep RNN architecture is used to estimate the value function. Furthermore, authors in (Liu et al., 2015) present an approximate dynamic programming algorithms for  solving undiscounted optimal control problems. Two multilayer feedforward neural networks are used to approximate both the control policy and the value function. In order to enhance the resource utilization rate and reduce the computation cost, authors in  present an event-based iterative adaptive critic algorithm, in which three neural networks are constructed but possessing different roles. That is: the model network employed for prediction, the critic network built for evaluation, and the action network used for control. In order to tackle dynamic uncertainties, authors in (Wang, 2019) study robust policy learning control for nonlinear plants. Neural network based actor-critic structure is designed to implement the robust control.
Authors in (Zhu et al., 2019) research the optimal management of multiple batteries over a long time horizon in order to prolong battery lifetime. Approximate dynamic programming is adopted to solve the problem, and fuzzy systems are used to approximate value functions. Compared to neural networks, the fuzzy approximation only requires to compute target values.
Based on the above papers, the ADP method is effective to solve the dispatching problem, and the ADP method can be divided into the following steps as: 1) build the dispatching optimization model; 2) transfer the multi-step decision problem into a series of one-step decision problems; 3) find the relationship between the states and future costs, using lookup table/regression/neural network methods to describe the relationship, namely, build the approximate value function; 4) integrate the approximate value function into the one-step decision model; 5) solve the approximate value function based one-step decision problem.

Electricity/Heat/Gas Utility Grids Operation
The above section reviews the related work about scheduling algorithms for microgrid. In addition, when microgrid interconnects with the electricity/heat/gas utility grids, the operation of the electricity/heat/gas utility grids should also be considered.
For the coupled multi-energy networks operation, centralized optimization algorithm is often used to solve the optimal power flow. For example, authors in (Qin et al., 2020) study the operation of integrated energy systems consisting of electricity and natural gas utility networks, a multi-objective optimization method is used to solve the coordinated operation of the coupling network. In (Sun et al., 2020), authors study the day-ahead scheduling of gas-electric integrated energy system considering the bidirectional energy flow. The goal is to minimize the operation cost, and a second-order cone programming method is utilized to solve the problem. In , authors study the operation of the integrated gas and electrical power system considering the different response times of the gas and power systems. The problem is transformed into a single-stage linear programming. In (Chen et al., 2017), authors study the optimal operation of electricity-gas integrated energy system. The goal is to minimize the operation costs for both electrical and natural gas systems while satisfying steady-state operational constraints.
To model the electricity/heat/gas utility grid networks. The steady-state operational equations are often built as the constraints, and added to the previous optimization problem. For example, in , authors present a sequential reliability assessment method considering multi-energy flow and thermal inertia. Hydraulic circulation and heat exchange equations are used to model the thermal network. Conventional power flow equations are adopted to describe distribution network model. In (Martínez Ceseña et al., 2020), the electricity network model is represented as conventional power flow equations, as well as thermal and voltage limits. The gas network is represented as steady-state equations. The conventional steady-state equations and a thermal module are utilized to model the heat network. In , authors present a planning strategy for a district energy sector considering the coupling of power, gas, and heat systems. An optimal multienergy flow model is developed, and the objective is to minimize operational costs. Distflow equations are used to describe the power distribution system, steady gas flow equations are adopted to model the gas distribution system, steady-state model is deployed to describe the distribution heat system. In (Martínez Ceseña and Mancarella, 2019), authors present a robust optimization framework for smart districts with multi-energy devices and electricity/heat/gas energy networks. The electricity network is modelled with typical power flow equations. The heat network is described based on nodal balance and cumulative head losses equations. The gas network is represented based on nodal balance, pressure drops, and head losses equations.
Based on the above reviews, optimization method is often used to calculate the power flow of the electricity/heat/gas energy networks. The electricity network is modelled based on typical power flow equations. The heat network is modelled based on nodal balance and heat losses equations. The gas network is represented based on nodal balance, pressure drops equations.

Contributions
The above review shows that the operation problem of multienergy supply microgrid and the operation problem of coupled electricity/heat/gas energy networks have drawn a lot of attention. However, using the ADP algorithm to solve the dispatching problem of the hydrogen-based multi-energy supply microgrids considering electricity/heat/gas energy networks has not drawn a lot of attention. The complexity of the whole model increases the difficulty of the control, especially the large numbers of constraints. Motivated by the aforementioned references, we present an ADPbased computationally efficient algorithm for the real-time operation of multi-energy supply grid-connected microgrids. A similar study is our previous work (Li et al., 2018a), in which only MPC algorithm is used, no other algorithms are compared.
Compared to previous works, the contribution of this paper can be concluded as follows: • First, we build an ADP-based one-step decision model for the optimal operation of multi-energy supply grid-connected microgrids. In the one-step decision model, we consider large numbers of logical and physical constraints, and formulate the problem as a mixed-integer programming model; • Second, in the ADP model, we research about different factors. Linear, nonlinear, and neural network regression are compared to research about the influence of the approximate value function on the decision results. Offline and online processes are developed to research about the impact of the historical data on the regression approximate value function; • Last, we compare the performance of the sliding window MPC, the one-step decision ADP and the global optimization algorithms from different perspectives, including the running time, the real-time operation cost, total operation cost, and the exchanged energy with the utility grid networks. The results show that the neural network based ADP method has the best performance, with the less than 3% total operation cost relative error, and has a running time of only 31% of Global algorithm.
The remainder of this paper is organized as follows. Section 2 describes the microgrid scheduling problem. Section 3 describes the electricity/heat/gas utility grids model. Section 6 presents the simulation results. Finally, Section 7 concludes the paper.
In fact, to operate the electricity/heat/gas integrated microgrids system, three aspects should be considered: 1) scheduling of the grid-connected microgrid; 2) utility grids operation; 3) the operation of the whole system.

MICROGRID SCHEDULING PROBLEM FORMULATION
To schedule the grid-connected microgrids, the coordinated strategy is often adopted, namely, day-ahead scheduling and real-time dispatching. In day-ahead scheduling, the expected exchange energy with utility grids are calculated, based on the exchanged energy, we can decide the role of the microgrids, namely, microgrids operate as a generator or as a load. In realtime dispatching, the ADP-based one-step decision problem is solved. It takes the future operation cost into consideration and makes the current dispatching more reasonable, and at the same time reduces the solving time.
We introduce the problem from three aspects: 1) day-ahead scheduling; 2) real-time dispatching based on MPC; 3) real-time dispatching based on ADP.

Microgrid Day-Ahead Scheduling
In order to make the problem more readable, we use the simple model to describe the problem, and the detailed model is attached in Supplementary Material. The scheduling problem can be described as follows: where x i are the continuous variables, x j are the integer/logical variables; A, B, C, D, b, c, d, e are the constraints matrix; f (.) is the operation cost function; T is the time horizon. By solving the above mixed integer programming problem, we can obtain the scheduling results. However, due to the uncertainty of the load demand and the output of renewable energy, some parameters in constraints are not deterministic parameters. The above problem is then transferred to the following problem: wherec are the uncertainty parameters. For example, in power balance constraints, generated power must equal to load demand, but the predicted load demand is uncertain.
The common method to solve the above uncertainty problem is stochastic optimization. The above problem can be transferred as follows: In the above stochastic problem, we use a scenario-based method to transfer the uncertainty parametersc to typical scenarios N s , and the probability of each scenario is p s . Lastly, to solve the above problem, we can obtain the scheduling results in each scenario.
Assume that the variables that exchanged energy with utility grids are x ex ∈ x i . Then the expected exchanged energy is:

Microgrid Real-Time Dispatching Based on MPC
Based on the day-ahead scheduling results, we can then implement real-time dispatching. Due to the real-time shortterm prediction uncertainty, the real-time exchanged energy with the utility grid may not equal to the day-ahead scheduling results.
In order to reduce this error, it is necessary for the real-time exchanged energy to follow the day-ahead scheduling results as close as possible. The sliding window model predictive control method is then adopted to deploy the real-time dispatching, the where g(.) is the real-time operation cost function; x p ex are the day-ahead scheduling results; t n is the time horizon.
In the real-time sliding window dispatching, in the first time step t, the MPC problem is solved, then only the current time decisions (current time is t) are deployed, and the future decisions (future times are t + 1, . . . , t + t n ) are abandoned. After that, the time slides to the next step t + 1, and the MPC problem is solved again, then only the new current time decisions (new current time is t + 1) are deployed, and the future decisions (future times are t + 2, . . . , t + t n + 1) are abandoned. This process is repeated until the last time is reached, the process can be seen in Figure 3A.

Microgrid Real-Time Dispatching Based on ADP Method
In the above section, the sliding window MPC method is adopted to deploy real-time dispatching. However, the solving time of the MPC method is long, because we need to solve the multiple windows optimization problem. In this section, the one-step decision model is developed to solve the real-time dispatching problem. With the one-step decision model, the solving time can be reduced. On the other hand, the ADP idea is also adopted, namely, integrating the future impacts into the current decision model, to make the current decision results more reasonable and effective.
In fact, the above MPC problem can be transferred into a series of smaller problems based on dynamic programming idea, which can be represented as follows: We use VF t+1 to describe the total future cost from t + 1 to t + t n , namely, VF t+1 min xt+1,...,xt+t n τ t+1 Then the above problem can be represented as: Because the future cost VF t+1 is dependent on the current decisions x t and post-decision states S t+1 , then the general one-step ADP decision model can be described as follows, and the detailed model is attached in Supplementary Material: where VF is the approximate value function, VF(S t+1 ) is the approximate future operation cost based on the state S t+1 ; SF is Frontiers in Electronics | www.frontiersin.org April 2021 | Volume 2 | Article 637736 the state transition function, which is used to describe how the current state S t is changed to the next time state S t+1 . By solving the above one-step decision model (namely, the decision variables are only at the current time), one can obtain the optimal dispatching results. However, it can be seen that the main thing in the above one-step decision model is the approximate value function VF. If we can find a good approximate value function VF to describe the relationship between the state S t+1 and the future operation cost C future , then we can obtain good and effective decision results.

Approximate Value Function VF
The approximate value function VF is used to describe the relationship between the state S t and the future operation cost C future , which can be represented as follows: where L pre is the future predicted load demand and renewable energy output.
With the approximate value function VF, one can calculate the future operation cost based on the state S t and the predicted data L pre . Then, to find a good approximate value function VF is the key problem. In this section, we introduce how to find the approximate value function VF.
Firstly, we need to obtain the historical dataset of {C future , [S t , L pre ]}. The dataset can be obtained based on offline simulation. Give different values of [S t , L pre ], solve the problem Eq. 9, we can then calculate the future operation cost C future . In addition, in the actual operation, we can also obtain the new dataset. So, the dataset is updated continuously as the operation running forward.
Secondly, we need to analyze the dataset to find the relationship between C future and [S t , L pre ], namely, calculate the approximate value function VF. Here, we adopted three methods, i.e., the linear, nonlinear regression and neural network regression algorithms.
In the linear regression method, we use function C future a 0 + a 1 · S t + a 2 · L pre to describe the relationship, and the approximate value function VF is the value of the parameters a 0 , a 1 , a 2 , namely, VF ≡ {a 0 , a 1 , a 2 }. In nonlinear regression method, the function is In neural network regression method, the function is C future NN(S t , L PV pre , L el pre , L heat pre , L gas pre ), NN is the neural network function, the approximate value function VF ≡ {NN}.
At last, we developed offline and online processes to deploy the ADP method. In the offline process, at each time t, there are four steps: 1) update the dataset {C future , [S t , L pre ]}; 2) based on the dataset, calculate the approximate value function VF; 3) solve the problem Eq. 9, and obtain the dispatching results; 4) save the operation results in step 3), and return to step 1). The offline process can be summarized as: Algorithm 1 Offline simulation process. 9: min xt g(x t , x t ex , VF(S t+1 )) 10: save the operation results; 11: t t+1; 12: end for In the online process, there is not enough initial dataset, so the dataset is obtained and updated based on the online operation. At each time t, the process is run N it times. In each running i, i 1, 2, . . . , N it , firstly, the dataset {C future , [S t , L pre ]} is updated; and then, the approximate value function VF is calculated; after that, problem Eq. 9 is solved; and save the operation results; at last, return to the next running i + 1. After N it running times are finished, then go to the next time t + 1. The online process can be summarized as: Algorithm 2 Online simulation process 1: initialize N it ; 2: for t 1 : T do. g(x t , x t ex , VF(S t+1 )) 10: save the operation results; 11: i i + 1; 12: end for. 13: t t + 1; 14: end for

ADP State Transition Process
The state transition process can be seen in Figure 3B. It can be seen that future approximate operation cost VF(S t+1 ) VF(S t ) − c t , where c t is the instant operation cost from time t to time t + 1. At time t, state S t includes hydrogen tanks state S t gs , electricity/heat/gas load demands L t el , L t heat , L t gas , PV output L t PV . Action a t includes the dispatching strategies.

UTILITY GRIDS OPERATION PROBLEM
For the integrated utility grids model, an IEEE30 + gas20 + heat14 hybrid network is adopted. The structure of each utility grid network is presented in Supplementary Material.

Electricity Utility Grid Operation
For the electricity utility grid operation, it is a classical optimal power flow (OPF) problem. The OPF problem can be seen as follows: where the P i g , Q i g are the real and reactive power of the i th generator. f i P , f i Q are the individual polynomial cost function of the i th generator.
Power balance constraints can be shown as the following: where P load i , Q load i are the real and reactive load demand at bus i. G line ij , B line ij are the parameters of the power lines from bus i to bus j. where

Heating Utility Grid Operation
For the heating utility grid, it is a heating power flow problem. During the heating transportation, heat transportation loss should be considered. The heating transportation loss can be described as follows (Pirouti, 2013;Shabanpour-Haghighi and Seifi, 2015).
where c p is the specific heat capacity (KJ/kgK), _ m is the mass flow rate (kg/s), and T s1 , T s2 are the temperature at node s1 and node s2.
The temperature drop through the heating flow system can be described as: where l is the pipe length, U is the heat transition coefficient (W/ mK), and T g is the ground temperature. Based on (Eqs. 14Eqs. 15), it can be seen that the heating loss during the transportation is a nonlinear equation. In order to reduce the complexity, in this paper, we choose a linear model to describe the heating transportation loss. We assume that the heating loss is a linear function of the transportation distance, which can be shown as the following: where k loss heat is the coefficient of the heating loss. Then, the heating power flow of the heating utility grid can be presented. For each heating pipeline, two state variables (binary variables, 0 or 1): ULine out heat , ULine in heat are defined. Then the heating power flow in each pipeline can be described as the following constraints: An example is presented here to explain the logical illustrated in Eq. 18. In Eq. 18, there are three nodes h1, h2, and h3, the connections are h1↔h2, and h2↔h3. The heating power flow at node h2 can be described as in Eq. 19.

Gas Utility Grid Operation
For the gas utility grid, it is a gas power flow problem. The gas flow can be described as follows (De Wolf and Smeers, 2000): where f ij is the gas flow between nodes i and j, p i and p j are the pressure at nodes i and j, and C ij is a constant which depends on the length, the diameter and the absolute rugosity of the pipe and on the gas composition. During the gas transportation, the pressure will drop, which is modeled as in Eq. 21.
Based on Eqs. 20Eqs. 21, we can obtain f 2 dep C 2 12 (p 2 1 − p 2 2 ). Then, the gas pressure drop can be described as: Assume that the loss C 2 12 · H 2 loss can be represented as C 2 12 · H 2 loss ≈ f 2 dep · f loss , where f loss is a coefficient parameter to describe the pressure drop. Next, we can obtain . In (Martinez-Mares and Fuerte-Esquivel, 2012), it shows that the pressure drop H loss is a complex function related to the nonlinear effect of the pipeline distance L pipe gas and the weather conditions. Coefficient parameter f loss is also a nonlinear function. In order to reduce the complexity, here a linear model is adopted to describe the pressure drop. Assume that coefficient parameter f loss is a linear function of the gas pipeline distance, which can be shown as f loss k loss gas · L pipe gas (23) where k loss gas is the coefficient of the gas loss. Then the gas power flow in the gas utility grid can be presented. For each pipeline, two state variables (binary variables, 0 or 1) ULine out gas , ULine in gas are defined. Then the gas flow constraints are: Frontiers in Electronics | www.frontiersin.org April 2021 | Volume 2 | Article 637736 0 ≤ Line out gas (i, t) ≤ ULine out gas (i, t) · Line max gas (i) 0 ≤ Line in gas (i, t) ≤ ULine in gas (i, t) · Line max gas (i) ULine out gas (i, t) + ULine in gas (i, t) ≤ 1 (24) Here we also use an example to explain the gas flow, which is shown in Eq. 25. There are three nodes g1, g2, and g3. The connections are g1↔g2, and g2↔g3. The gas flow at node g2 can be described as: The gas flow in a gas pipeline is restricted by the pressure of the beginning and end nodes. This constraint can be described as: where p i,min , p i,max , p j,min , p j,max are the minimum and maximum pressure at node i and j.

THE SEQUENTIAL OPERATION OF THE WHOLE SYSTEM
Four microgrids are interconnected with the hybrid IEEE30 + gas20 + heat14 network. It is actually difficult to schedule this complex system. In this paper, we present a sequential strategy as follows: 1) first, four microgrids run their scheduling algorithms based on MPC or ADP method [section (2)], and obtain the exchanged energy with electricity/heat/gas utility grids; 2) second, the utility grids receive the exchanged energy, and run their power flow problem [Section (3)].

SYSTEM SETUP
In this paper, an IEEE-30 + gas-20 + heat-14 hybrid system is adopted as the utility grids. Four multi-energy microgrids are connected with the utility grids. The structure is presented in Figure 2. Microgrid MG1 is connected at electrical node e23, gas node g7, heat node h9. Microgrid MG2 is connected at electrical node e17, gas node g6, heat node h10. Microgrid MG3 is connected at electrical node e14, gas node g15, heat node h4. Microgrid MG4 is connected at electrical node e7, gas node g10, heat node h13. The configutation of this hybrid system is summarized in Eq. 28. The model is implemented in MATLAB and solved with YALMIP (Löfberg, 2012)  A typical day is chosen. Based on the forecasted load demands and PV output, microgrids firstly run their day-ahead scheduling algorithm, and the exchanged energy results with the utility grids are obtained and then transferred to the real-time dispatching algorithm. Secondly, the real-time rolling horizon dispatching algorithm is solved based on the new forecasting data and the day-ahead exchange results.
The load demands (peak load) of each microgrid and microgrid operation parameters are presented in Supplementary Material.

SIMULATION RESULTS
Based on the above strategy, the simulation running is deployed. The simulation results are presented from four aspects: 1) scheduling results; 2) operation cost analysis; 3) exchanged energy analysis; 4) utility grids power flow.

Scheduling Results
Different cases are presented to research about the performance of each algorithm. Cases ADP linear b and ADP linear c are used to study the linear regression AVF. Cases ADP nonlinear A and ADP nonlinear B are used to study the nonlinear regression AVF. Cases ADP online 30, ADP online neg1, and ADP online neg3 are compared to study the online process. In order to study the influence of optimization window numbers, cases MPC 12 , MPC 6 , and MPC 1 are set. Cases ADP NN neg1, ADP NN neg5, and ADP NN neg10 are presented to study the neural network regression AVF. All cases are compared and concluded as follows: 1. ADP linear b: the AVF is constructed based on linear regression, and the coefficient is C b 10 − 2 ; 2. ADP linear c: the AVF is constructed based on linear regression, and the coefficient is C c 10 2 ; 3. ADP nonlinear A: the AVF is constructed based on nonlinear regression, and the coefficient is C A 1.2*10 − 8 ; 4. ADP nonlinear B: the AVF is constructed based on nonlinear regression, and the coefficient is C B 10 − 8 ; 5. ADP online 30: the AVF is constructed based on linear regression, the simulation is processed based on online Algorithm 2, and the iteration time is 30; 6. ADP online neg1: the AVF is constructed based on linear regression, the simulation is processed based on online Algorithm 2, and the coefficient is C A online 10 − 1 ; 7. ADP online neg3: the AVF is constructed based on linear regression, the simulation is processed based on online Algorithm 2, and the coefficient is C B online 10 − 3 ; 8. Global: the algorithm is the MPC method, and the optimization window is 288 (12*24 h 288); 9. MPC 12 : the algorithm is the MPC method, and the optimization window is 12; 10. MPC 6 : the algorithm is the MPC method, and the optimization window is 6; 11. MPC 1 : the algorithm is the MPC method, and the optimization window is 1, namely one-step decision method, but the future costs are not considered; 12. ADP NN neg1: the AVF is constructed based on neural network regression, the simulation is processed based on offline algorithm, and the coefficient is C A NN 10 − 1 ; 13. ADP NN neg5: the AVF is constructed based on neural network regression, the simulation is processed based on offline algorithm, and the coefficient is C B NN 10 − 5 ; 14. ADP NN neg10: the AVF is constructed based on neural network regression, the simulation is processed based on offline algorithm, and the coefficient is C C

NN
The simulation results of the real-time SOC of MG4 can be seen in Figure 4. Here SOC means the percentage of hydrogen in tanks. It can be seen that with different algorithms, the real-time dispatching results are significantly different. This is because in different algorithms, the future operation value functions are different, leading to different scheduling results.
We compare these different algorithms in the following: • ADP Ind : min xt u el cost · E el,T grid − Z el,t grid + u heat cost · E heat,T grid − Z heat,t grid + VF Ind S(t + 1), L pre · C Ind + α · LS t gas + β · LS t el + c · LS t heat ; where Ind {linear, nonlinear, NN, online} represents different types of ADP algorithms. E el,T grid is the day-ahead exchanged electricity power at time T, Z el,t grid is the real-time exchanged electricity power at time t, E heat,T grid is the day-ahead exchanged heat power at time T, Z heat,t grid is the real-time exchanged heat power at time t. E el,T grid − Z el,t grid is used to describe the real-time electricity power deviation from the day-ahead results, and the unit is MW. u el cost , u heat cost are the unit cost of electricity and heat power deviation from the day-ahead results, and the unit is €/MW. LS t k , k (gas, el, heat) are the load shedding of the gas, electric, and heat load demands, the unit is MW. α, β, γ are penalty values of demands load shedding, the unit is €/MW.
• VF linear S(t + 1), L pre a 0 + a 1 · S t+1 + a 2 · L pre ; where C linear ∈ {C b , C c } are coefficients, which is used to adjust the proportion of linear based AVF.
• VF nonlinear S(t + 1), where C nonlinear ∈ {C A , C B } are coefficients, which is used to adjust the proportion of nonlinear based AVF.
• VF NN S(t + 1), L pre NN S t+1 , L PV pre , L el pre , L heat pre , L gas pre ; (32) where C NN ∈ {C A NN , C B NN , C C NN } are coefficients, which is used to adjust the proportion of neural network based AVF.
• VF online S(t + 1), L pre a 0 + a 1 · S t+1 + a 2 · L pre ; where a 0 , a 1 , a 2 are changed in each iteration. C online ∈ {C A online , C B online } are coefficients, which is used to adjust the proportion of AVF.
where sw is the optimization window number in MPC algorithm.
In Figure 4, we set the case Global as the basic case, because in case Global, the scheduling results are "global optimization"; however, in the other cases, the results are "local optimization". Compare cases ADP linear and case Global, the SOC curves are very different, especially, in cases ADP linear ,   Figure 5A, namely, we choose different coefficients ADP1 : C a 10 0 ; ADP neg 2 : C b 10 − 2 ; ADP pos 2 : C c 10 2 ; ADP neg 4 : C d 10 − 4 ; ADP pos 4 : C e 10 4 (35) The linear regression of value function is shown in Figure 5B.
In fact, case ADP1 and case ADP neg 2 have very similar SOC curve, and they overlap together. It can be seen that the scheduling results based on linear approximate value function ADP linear deviate from the "global optimization" curve. This means that the linear approximate value function can not describe the future operation cost well. One important reason is that the dataset which is used to regress the linear value function is not completely, the other reason is that the linear function can not regress the value function well, and at last, leading to inaccuracy approximate value function.
Then, we adopt the nonlinear function to regress the dataset. And we compare different ADP nonlinear cases in Figure 6A, namely, we choose different coefficients ADP nonlinear A : C A 1.2*10 − 8 ; ADP nonlinear B : C B 10 − 8 ; ADP nonlinear C : C C 10 − 9 ; ADP nonlinear D : C D 10 −10 ; ADP nonlinear E : C e 10 −13 . Figure 6B.

The nonlinear regression of value function is shown in
It can be seen that based on the nonlinear approximate value function, the scheduling results have similar tendency to the global results. And with different coefficients C A , C B , C C , C D , C E , the scheduling results are similar to each other. However, the SOC curve values are still far away from the Global optimization results.
After that we adopt the neural network to regress the dataset. And we compare different ADP NN cases in Figure 7A It can be seen that based on the neural network approximate value function, if we choose the approximate coefficients, the scheduling results are very close to the global optimization results, which means that the neural network can regress the value function well.
After that we develop an online simulation process, namely, at each time, the one-step decision model is iteratively simulated 30 times. The simulated operation cost of MG4 is shown in Figure 7B.
At each time, the one-step optimization model is solved for 30 times, and in each iteration, the parameters of the approximate value function is updated. Based on Figure 7B, it can be seen that at each time step, after 30 times iteration, the operation costs keep constantly, which means that the iteration process is convergence.

Operation Cost Analysis
In this section, we analyze the operation cost of MGs based on different algorithms. Operation costs are the results of the problem Eq. 29 and problem Eq. 34. We use a 2-norm error to describe the difference between real-time operation cost of different algorithms and global optimization. The 2-norm error can be represented as:  Table 1 shows the 2-norm error of real-time operation cost of MG4 with different algorithms. It can be seen that ADP NN neg5 has the smallest 2-norm error, and ADP linear pos4 has the largest 2norm error. This means that at each time step, the real-time operation cost of ADP NN neg5 is the closest to the Global optimization real-time operation cost, namely, algorithm ADP NN neg5 has the best real-time performance.
We then compare the total operation cost (total time horizon) in Table 2 and Figure 8. It can be seen that case Global has the minimum total operation cost, because it is the global optimization. ADP method and the MPC method have the similar total costs. Different coefficients in ADP and MPC lead to different total costs, which means that choose appropriate coefficient is important.
Then, we need to choose an index to evaluate different algorithms. Here, we use relative error re to describe different algorithms, namely, where TC m , m {ADP linear , ADP nonlinear , ADP NN , ADP online , MPC} and TC global are the total cost under different algorithms and global optimization. We can then calculate the relative error with different algorithms, which is shown in Table 2. It can be seen that in case ADP linear , with different coefficients the relative errors are different, especially when the coefficients are large (for example, case ADP linear pos2, ADP linear pos4), the relative errors are large, which means that the scheduling results deviate far from the global optimization results. In five nonlinear cases ADP nonlinear , it can be seen that the differences are small, and the relative error is less than 4%.
In the online case ADP online , the relative error is less than 7%, but after adjust the coefficient, the relative error decreases to 4% in cases ADP online neg1 and ADP online neg3. For the online process, the inner value function and the iteration time are two important factors to influence the operation cost and the scheduling results.
For the MPC cases, the optimization window number is important, it can be seen that when the optimization window number is 6, the relative error is less than 1.5%; and the sliding window is 12, the relative error is about 4.3%. For the ADP NN cases, it can be seen that the relative error is less than 3% in cases ADP NN neg5 and ADP NN neg10.
At last, from the post-event analysis view (total operation cost), it can be seen that algorithm MPC 6 has the best performance (in terms of total operation cost), and the second is the algorithm ADP NN neg5 and ADP NN neg10.
In conclusion, different algorithms have advantages and disadvantages, we choose four indexes to compare these  algorithms: running time, one-step simulation time τ, results, and complexity, which can be seen in Table 3.

Exchanged Energy With Utility Grids
The exchanged electricity/heat/gas with utility grids are shown in Figures 9, 10, 11A. In order to make these figures readable, we calculate the 2-norm error of the exchanged energy under different algorithms (case "Global" is set as the basic case), which is shown in Table 4.
For the exchanged electricity, cases ADP linear pos2 and ADP online have large 2-norm errors, which means that they can not effectively follow the day-ahead exchanged electricity scheduling. However, for cases ADP nonlinear and MPC, the 2-norm errors are zero, which means that they can follow the day-ahead exchanged electricity well.
For the exchanged heat, it can be seen that cases ADP linear pos2, ADP online and MPC 12 have large 2-norm errors, and for the other cases, the error is less than 2.1. Especially, for cases ADP NN neg5 and ADP NN neg10, the error is less than 1.9. In Figure 9, it can be seen that only case ADP linear pos2 deviates largely from the day-ahead results, and other cases all can follow the day-ahead results well.
For the exchange gas, cases ADP online , ADP linear pos2, and MPC 6 have large 2-norm errors, and for the other cases, the error is less than 0.0022.
At last, overall consideration of error ele ex , error heat ex , and error gas ex . It can be seen that algorithm ADP NN neg5 has the best performance (in terms of exchanged energy).

Utility Grids Power Flow
Based on the above exchanged energy, the electricity/heat/gas utility grids then run their power flow algorithm. The voltage of the IEEE-30 node electricity network with ADP NN neg5 is presented in Figure 11B. Gas flow in gas-20 node network with ADP NN neg5 is presented in Figure 11C. Heating power flow in heat-14 node network with ADP NN neg5 is presented in Figure 11D. The other power flow results are presented in Supplementary Material. It can be seen that the power flow in each utility network is within the security area and satisfy the operation constraints.

CONCLUSION
In this paper, the real-time operation of grid-connected microgrid based on ADP algorithm was studied, a hybrid multi-energy supply microgrid model was adopted. We focused on studying the performance of different scheduling algorithms. Day-ahead stochastic scheduling and real-time dispatching coordinated strategy was adopted.
For the day-ahead scheduling, the scenario-based stochastic optimization was used. For the real-time dispatching, ADP and MPC algorithms were adopted, different parameters and coefficients were compared to study the performance of each algorithm. In Table 3, MIP means mixed-integer programming, LR, linear regression,; MINLP, mixed-integer nonlinear programming; NLR, nonlinear regression; NNR, neural network regression. Based on the simulation results, some conclusions were presented: 1) ADP and MPC algorithm had the ability to implement the real-time operation. Linear function based AVF ADP algorithm, one optimization window number MPC algorithm had a fast running time. Nonlinear function based AVF ADP algorithm had an average running time. Online process ADP method, global optimization and multiple window numbers MPC algorithm had a slow running time.
2) In the ADP method, AVF was the important parameter to influence the dispatching results. In fact, neural network based AVF ADP algorithm had the smallest real-time operation cost 2-norm error, less than 3% total operation cost relative error, and the smallest exchanged energy 2-norm error, which means that neural network based AVF ADP had the best performance. In addition, the running time of neural network based AVF ADP was only 31% of the Global algorithm.
3) In the online process, because there was not enough initial dataset, the regression AVF could not better describe the future operation cost, which leaded to an average performance. In addition, at each time step, the real-time optimization problem was iteratively solved for several times, which also increased the running time. However, the online process provided a method to make the decision when there was not enough initial dataset.
In conclusion, we presented a neural network based ADP real-time dispatching algorithm, which had almost the same performance with Global optimization, while only 31% running time of the Global algorithm. It can be directly utilized in industry scenarios and improve the dispatching performance compared to MPC algorithm.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author. Frontiers in Electronics | www.frontiersin.org April 2021 | Volume 2 | Article 637736