<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Energy Res.</journal-id>
<journal-title>Frontiers in Energy Research</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Energy Res.</abbrev-journal-title>
<issn pub-type="epub">2296-598X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">947532</article-id>
<article-id pub-id-type="doi">10.3389/fenrg.2022.947532</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Energy Research</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>An AGC Dynamic Optimization Method Based on Proximal Policy Optimization</article-title>
<alt-title alt-title-type="left-running-head">Liu et al.</alt-title>
<alt-title alt-title-type="right-running-head">An AGC Method Based on PPO</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Liu</surname>
<given-names>Zhao</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1493882/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Li</surname>
<given-names>Jiateng</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Zhang</surname>
<given-names>Pei</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Ding</surname>
<given-names>Zhenhuan</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zhao</surname>
<given-names>Yanshun</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1840998/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>School of Electrical Engineering, Beijing Jiaotong University</institution>, <addr-line>Beijing</addr-line>, <country>China</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>Department of Artificial Intelligence Applications, China Electric Power Research Institute</institution>, <addr-line>Beijing</addr-line>, <country>China</country>
</aff>
<aff id="aff3">
<sup>3</sup>
<institution>School of Artificial Intelligence, Anhui University</institution>, <addr-line>Hefei</addr-line>, <country>China</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1222560/overview">Bo Yang</ext-link>, Kunming University of Science and Technology, China</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1222566/overview">Xiaoshun Zhang</ext-link>, Northeastern University, China</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1256910/overview">Jiawen Li</ext-link>, South China University of Technology, China</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Zhao Liu, <email>liuzhao1@bjtu.edu.cn</email>; Pei Zhang, <email>2512692577@qq.com&#x200a;</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Smart Grids, a section of the journal Frontiers in Energy Research</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>13</day>
<month>07</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>10</volume>
<elocation-id>947532</elocation-id>
<history>
<date date-type="received">
<day>18</day>
<month>05</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>07</day>
<month>06</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2022 Liu, Li, Zhang, Ding and Zhao.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Liu, Li, Zhang, Ding and Zhao</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>The increasing penetration of renewable energy introduces more uncertainties and creates more fluctuations in power systems than ever before, which brings great challenges for automatic generation control (AGC). It is necessary for grid operators to develop an advanced AGC strategy to handle fluctuations and uncertainties. AGC dynamic optimization is a sequential decision problem that can be formulated as a discrete-time Markov decision process. Therefore, this article proposes a novel framework based on proximal policy optimization (PPO) reinforcement learning algorithm to optimize power regulation among each AGC generator in advance. Then, the detailed modeling process of reward functions and state and action space designing is presented. The application of the proposed PPO-based AGC dynamic optimization framework is simulated on a modified IEEE 39-bus system and compared with the classical proportional&#x2212;integral (PI) control strategy and other reinforcement learning algorithms. The results of the case study show that the framework proposed in this article can make the frequency characteristic better satisfy the control performance standard (CPS) under the scenario of large fluctuations in power systems.</p>
</abstract>
<kwd-group>
<kwd>automatic generation control</kwd>
<kwd>advanced optimization strategy</kwd>
<kwd>deep reinforcement learning</kwd>
<kwd>renewable energy</kwd>
<kwd>proximal policy optimization</kwd>
</kwd-group>
<contract-sponsor id="cn001">Fundamental Research Funds for the Central Universities<named-content content-type="fundref-id">10.13039/501100012226</named-content>
</contract-sponsor>
<contract-sponsor id="cn002">National Natural Science Foundation of China<named-content content-type="fundref-id">10.13039/501100001809</named-content>
</contract-sponsor>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>Automatic generation control (AGC) is applied to ensure frequency deviation and tie-line power deviation within the allowable range in power systems as a fundamental part of energy management system (EMS) (<xref ref-type="bibr" rid="B13">Jaleeli et al., 2002</xref>). Conventional AGC strategies calculate the total adjustment power based on the present information collected from Supervisory Control and Data Acquisition (SCADA) system including frequency deviation, tie-line power deviation, and area control error (ACE), etc., and then allocates the total adjustment to each AGC unit. The control period is generally 2&#x2013;8&#xa0;s. Therefore, the key to conventional AGC strategies is to solve two problems: &#x2460; how to calculate the total adjustment power based on the online information; &#x2461; how to allocate the total adjusted power to each AGC unit with the goal of satisfying the control performance standard (CPS) and minimizing the operation cost. At present, to solve these two problems, scholars have proposed many control strategies. For calculating the total adjustment power, proposed strategies include the classical proportional&#x2212;integral (PI) control (<xref ref-type="bibr" rid="B7">Concordia and Kirchmayer, 1953</xref>), proportional&#x2212;integral-derivative (PID) control (<xref ref-type="bibr" rid="B19">Sahu et al., 2015</xref>; <xref ref-type="bibr" rid="B8">Dahiya et al., 2016</xref>), optimal control (<xref ref-type="bibr" rid="B5">Bohn and Miniesy, 1972</xref>; <xref ref-type="bibr" rid="B29">Yamashita and Taniguchi, 1986</xref>; <xref ref-type="bibr" rid="B10">Elgerd and Fosha, 2007</xref>), adaptive control (<xref ref-type="bibr" rid="B24">Talaq and Al-Basri, 1999</xref>; <xref ref-type="bibr" rid="B17">Olmos et al., 2004</xref>), model predictive control (<xref ref-type="bibr" rid="B2">Atic et al., 2003</xref>; <xref ref-type="bibr" rid="B16">Mcnamara and Milano, 2017</xref>), robust control (<xref ref-type="bibr" rid="B15">Khodabakhshian and Edrisi, 2004</xref>; <xref ref-type="bibr" rid="B18">Pan and Das, 2016</xref>), variable structure control (<xref ref-type="bibr" rid="B11">Erschler et al., 1974</xref>; <xref ref-type="bibr" rid="B22">Sun, 2017</xref>), and intelligent control technologies such as neural network (<xref ref-type="bibr" rid="B4">Beaufays et al., 1994</xref>; <xref ref-type="bibr" rid="B32">Zeynelgil et al., 2002</xref>), fuzzy control (<xref ref-type="bibr" rid="B24">Talaq and Al-Basri, 1999</xref>; <xref ref-type="bibr" rid="B12">Feliachi and Rerkpreedapong, 2005</xref>), and genetic algorithm (<xref ref-type="bibr" rid="B1">Abdel-Magid and Dawoud, 1996</xref>; <xref ref-type="bibr" rid="B6">Chang et al., 1998</xref>). In terms of allocating total adjustment power to each AGC unit, a baseline allocation approach is proposed according to the adjustable capacity ratio and installed capacity ratio of each unit without considering the differences of dynamic characteristic among units. Additionally, <xref ref-type="bibr" rid="B31">Yu et al. (2011</xref>) treated the power allocation as a stochastic optimization problem, which can be discretized and modeled as a discrete-time Markov decision process. Also, the problem is solved by utilizing the Q-learning algorithm of reinforcement learning.</p>
<p>In general, the conventional AGC strategy is designed under a typical feedback-loop structure with the characteristic of hysteresis, which regulates the future output of AGC units based on the present input signal. However, the penetration of large-scale renewable energy introduces high stochastic disturbance to modern power grid due to the characteristic of dramatical fluctuation (<xref ref-type="bibr" rid="B3">Banakar et al., 2008</xref>). The phenomenon has not only increased regulation capacity of AGC units but also put forward higher requirement for coordinated control ability of generation units with different dynamic characteristics (such as thermal and hydroelectric units). Nevertheless, the fast regulation capacity of units in power systems is limited. When the load or renewable energy generations is continuously rising or falling, the units with second-level regulation performance will approach its upper or lower regulation limit. At this point, it is hard to ensure the frequency deviation and tie-line power deviation within an allowable range if the fast regulation capacities are insufficient in the system. On the other hand, the adjustment ratio of different units is different, i.e., to be exact, thermal units have minute-level regulation performance, while hydroelectric units have second-level regulation performance. Therefore, these strategies cannot effectively coordinate units with different characteristics, which will cause overshoot or under-adjustment. At present, the goal of AGC strategies is to maintain the dynamic control performance of system to comply with CPS established by the North American Electric Reliability Council (NERC) (<xref ref-type="bibr" rid="B14">Jaleeli and Vanslyck, 1999</xref>). CPS pays more attention to the medium- and long-term performance of system frequency deviation and tie-line power deviation, while it no longer requires the ACE to cross zero every 10&#xa0;min and aims to smoothly regulate the frequency of power systems.</p>
<p>To address the hysteresis issues of conventional AGC strategies and make the dynamic performance satisfy CPS, some scholars put forward the concept of AGC dynamic optimization (<xref ref-type="bibr" rid="B30">Yan et al., 2012</xref>). The basic idea can be described as the optimization of the regulation power of AGC units in advance based on ultra-short-term load forecasting and renewable energy generation forecasting information, different security constraints, and objective functions. The strategy aims to optimize the AGC units&#x2019; regulation power in the next 15&#xa0;min, and the optimization step is 1&#xa0;min. From the perspective of dispatching framework formulated by the power grid dispatching center, the AGC dynamic optimization can be viewed as a link between real-time economic dispatch (especially for the next 15&#xa0;min) and routine AGC (control period is 2&#x2013;8&#xa0;s), which can achieve a smooth transition between the two dispatch sections. Compared with economic dispatch, AGC dynamic optimization takes the system&#x2019;s frequency deviation, tie-line power deviation, and ACE and CPS values into account. Compared with conventional AGC strategies, it introduces load and renewable energy forecasting information into account which can better handle renewable energy&#x2019;s fluctuation. Moreover, the dispatch period is 1&#xa0;min which adapts to the thermal AGC units with minute-level regulation characteristics.</p>
<p>
<xref ref-type="bibr" rid="B30">Yan et al. (2012</xref>) proposed a mathematic model for AGC dynamic optimal control. It takes the optimal CPS1 index and minimizes ancillary service cost as objective function. The system constraints are considered including system power balance constraints, AGC units&#x2019; regulation characteristics, tie-line power deviation, and frequency deviation. This model added ultra-short-term load forecasting information into the power balance constraints as well as mapping the relationship between system frequency and tie-line power. <xref ref-type="bibr" rid="B39">Zhao et al. (2018</xref>) expanded the model proposed in <xref ref-type="bibr" rid="B30">Yan et al. (2012</xref>), taking the ultra-short-term wind power forecasting value and its uncertainties into account and conducted a chance constraint programming AGC dynamic optimization model with probability constraints and expected objectives. An optimal mileage-based AGC dispatch algorithm is proposed in <xref ref-type="bibr" rid="B36">Zhang et al. (2020</xref>). <xref ref-type="bibr" rid="B35">Zhang et al. (2021a</xref>) further extended the methods in <xref ref-type="bibr" rid="B36">Zhang et al. (2020)</xref> with adaptive distributed auction to handle the high participation of renewable energy. A novel random forest-assisted fast distributed auction-based algorithm is developed for coordinated control in large PV power plants in response to the AGC signals (<xref ref-type="bibr" rid="B37">Zhang et al., 2021b</xref>). A decentralized collaborative control framework of autonomous virtual generation tribe for solving the AGC dynamic dispatch problem was proposed in <xref ref-type="bibr" rid="B38">Zhang et al. (2016a</xref>).</p>
<p>In general, the existing research defined AGC dynamic optimal control as a multistage nonlinear optimization problem that includes objective functions and constraint conditions. To deal with the uncertainties of wind power, some scholars adopted chance-constrained programming method based on the probabilistic model of wind power. However, the accurate probability information of random variables is difficult to model, which limits the accuracy and practicality of this method. Moreover, the stochastic programming model is too complex to solve. Furthermore, these methods cannot take the future fluctuations of wind power into account when making decisions.</p>
<p>Artificial intelligence-based methods have been developed in recent years to address the AGC command dispatch problem, including the lifelong learning algorithm and the innovative combination of the consensus transfer of the Q learning (<xref ref-type="bibr" rid="B33">Zhang et al., 2016b</xref>; <xref ref-type="bibr" rid="B34">Zhang et al., 2018</xref>). Deep reinforcement learning (DRL) is a branch of machine learning algorithms and an important method of stochastic control based on the Markov decision process, which can better solve sequential decision problems (<xref ref-type="bibr" rid="B23">Sutton and Barto, 1998</xref>). Recently, DRL has been successfully implemented on many applications of power systems, such as optimal power flow (<xref ref-type="bibr" rid="B35">Zhang et al., 2021a</xref>), demand response (<xref ref-type="bibr" rid="B27">Wen et al., 2015</xref>), energy management system for microgrid (<xref ref-type="bibr" rid="B25">Venayagamoorthy et al., 2016</xref>), autonomous voltage control (<xref ref-type="bibr" rid="B33">Zhang et al., 2016b</xref>), and AGC (<xref ref-type="bibr" rid="B40">Zhou et al., 2020</xref>; <xref ref-type="bibr" rid="B28">Xi et al., 2021</xref>). In AGC problems, as stated previously, the presented literatures usually focus on the power allocation problem which still belongs to the conventional AGC strategy. Different from the previous works, this article focuses on AGC dynamic optimization and utilizes 1&#xa0;minute time resolution wind power and loads forecasting values, which are collected from and used by real wind farms and grid dispatching centers, to regulate the power outputs of AGC units. Unlike the existing optimization model, this article defines AGC dynamic optimization as a Markov decision process and a stochastic control problem and takes the various uncertainties and fluctuations of wind power outputs into account. To better solve the dynamic optimization and support safe online operations, the proximal policy optimization (PPO) deep reinforcement learning algorithm is implemented, with the clipping mechanism of PPO, which can provide more reliable outputs (<xref ref-type="bibr" rid="B21">Schulman et al., 2015</xref>).</p>
<p>The key contributions of this article are summarized as follows: &#x2460; by formulating the AGC dynamic optimization problem as the Markov decision process with appropriate power grid simulation environment, reasonable state space, action space, and reward functions, the PPO-DRL agent can be trained to learn how to determine the regulation power of AGC units without violating the operation constraints; &#x2461; by adopting the state-of-the-art PPO algorithm (<xref ref-type="bibr" rid="B26">Wang et al., 2020</xref>), the well-trained PPO-DRL agent could consider the uncertainties of wind power fluctuations in the future when making decisions at the current moment.</p>
<p>The remaining parts of this article are organized as follows: <italic>Introduction</italic> provides the advanced AGC dynamic optimization model considering wind power integration and the details of how to transform advanced AGC dynamic optimization into a multistage decision problem. <italic>Introduction</italic> introduces the principles of reinforcement learning, PPO algorithm, and the procedures of the proposed methodology. In <italic>Introduction</italic>, the IEEE 39-bus system is utilized to demonstrate the effectiveness of the proposed method. Finally, some conclusions are given in <italic>Introduction</italic>.</p>
</sec>
<sec id="s2">
<title>Advanced AGC Dynamic Optimization Mathematical Model and Multistage Decision Problem</title>
<p>The essential strategy of AGC dynamic optimization is an advanced control strategy, which aims to optimize the adjustment power of each AGC unit per minute in the next 15&#xa0;min according to the ultra-short-term load and wind generation forecasting information as well as the current operation condition of each unit, system frequency, and tie-line power. The objective function is to minimize the total adjustment cost, while the system dynamic performance (i.e., frequency, tie-line power deviation, and ACE) is to comply with CPS and satisfy the security constraints. Specifically, the constraints include system power balance, CPS1 and CPS2 indicators, frequency deviation, tie-line power deviation limit, and AGC unit regulation characteristics. The mathematical model of AGC dynamic optimization is formulated as follows:<disp-formula id="e1">
<mml:math id="m1">
<mml:mrow>
<mml:mi>min</mml:mi>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mi>C</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>15</mml:mn>
</mml:mrow>
</mml:munderover>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mi>C</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:munderover>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x394;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>where <inline-formula id="inf1">
<mml:math id="m2">
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mi>C</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the number of AGC units in the system, <inline-formula id="inf2">
<mml:math id="m3">
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf3">
<mml:math id="m4">
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the cost coefficients, <inline-formula id="inf4">
<mml:math id="m5">
<mml:mrow>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf5">
<mml:math id="m6">
<mml:mrow>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> are the maximum and minimum output of the AGC unit <inline-formula id="inf6">
<mml:math id="m7">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula>, L&#x325;, <inline-formula id="inf7">
<mml:math id="m8">
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf8">
<mml:math id="m9">
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are output of AGC unit <inline-formula id="inf9">
<mml:math id="m10">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula> at time <inline-formula id="inf10">
<mml:math id="m11">
<mml:mi>t</mml:mi>
</mml:math>
</inline-formula> and <inline-formula id="inf11">
<mml:math id="m12">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf12">
<mml:math id="m13">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is the time interval, that is, 1&#xa0;min.</p>
<p>1) Power balance constraints:<disp-formula id="e2">
<mml:math id="m14">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mo>&#x394;</mml:mo>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mstyle>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>where <inline-formula id="inf13">
<mml:math id="m15">
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf14">
<mml:math id="m16">
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are, respectively, the predicted power of wind power and load, <inline-formula id="inf15">
<mml:math id="m17">
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is scheduled power of tie-line, <inline-formula id="inf16">
<mml:math id="m18">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is forecast deviation of tie-line power, and <inline-formula id="inf17">
<mml:math id="m19">
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the transmission loss.</p>
<p>2) CPS1 constraints:<disp-formula id="e3">
<mml:math id="m20">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:munder accentunder="true">
<mml:mi>K</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2264;</mml:mo>
<mml:msub>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2264;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>K</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(3)</label>
</disp-formula>where <inline-formula id="inf18">
<mml:math id="m21">
<mml:mrow>
<mml:msub>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the CPS1 index of the system and <inline-formula id="inf19">
<mml:math id="m22">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:munder accentunder="true">
<mml:mi>K</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf20">
<mml:math id="m23">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>K</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are, respectively, the lower and upper limits of the CPS index. <inline-formula id="inf21">
<mml:math id="m24">
<mml:mrow>
<mml:msub>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is derived by the following equation:<disp-formula id="e4">
<mml:math id="m25">
<mml:mrow>
<mml:msub>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>C</mml:mi>
<mml:mi>E</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x394;</mml:mo>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>150</mml:mn>
<mml:mi>B</mml:mi>
<mml:msubsup>
<mml:mi>&#x3b5;</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mi>min</mml:mi>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>100</mml:mn>
<mml:mo>%</mml:mo>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(4)</label>
</disp-formula>where <inline-formula id="inf22">
<mml:math id="m26">
<mml:mrow>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>C</mml:mi>
<mml:mi>E</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the area control error at time t, <inline-formula id="inf23">
<mml:math id="m27">
<mml:mrow>
<mml:mo>&#x394;</mml:mo>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is frequency deviation at time t, B is the equivalent frequency regulation constant for the control area (in MW/0.1Hz), and <inline-formula id="inf24">
<mml:math id="m28">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b5;</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mtext>min</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the frequency control target and it is usually taken as annual statistic of the root mean square deviation of the interconnection power grid over 1&#xa0;min period.</p>
<p>3) CPS2 constraints:<disp-formula id="e5">
<mml:math id="m29">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>C</mml:mi>
<mml:mi>E</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>15</mml:mn>
<mml:mi>min</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mn>1.65</mml:mn>
<mml:msub>
<mml:mi>&#x3b5;</mml:mi>
<mml:mrow>
<mml:mn>15</mml:mn>
<mml:mi>min</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msqrt>
<mml:mrow>
<mml:mn>100</mml:mn>
<mml:mi>B</mml:mi>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>s</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msqrt>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(5)</label>
</disp-formula>where <inline-formula id="inf25">
<mml:math id="m30">
<mml:mrow>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>C</mml:mi>
<mml:mi>E</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>15</mml:mn>
<mml:mi>min</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the average ACE over the 15&#xa0;min period, <inline-formula id="inf26">
<mml:math id="m31">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b5;</mml:mi>
<mml:mrow>
<mml:mn>15</mml:mn>
<mml:mi>m</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the annual statistic of the root mean square deviation of the interconnection power grid over 15&#xa0;min period, and <inline-formula id="inf27">
<mml:math id="m32">
<mml:mi>B</mml:mi>
</mml:math>
</inline-formula> and <inline-formula id="inf28">
<mml:math id="m33">
<mml:mrow>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>s</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are, respectively, the equivalent frequency regulation constants for the control area and the whole interconnection power grid.</p>
<p>4) Power output constraints of units:<disp-formula id="e6">
<mml:math id="m34">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:munder accentunder="true">
<mml:mi>P</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2264;</mml:mo>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2264;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>P</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(6)</label>
</disp-formula>where <inline-formula id="inf29">
<mml:math id="m35">
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is output power of unit <inline-formula id="inf30">
<mml:math id="m36">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula> at time <inline-formula id="inf31">
<mml:math id="m37">
<mml:mi>t</mml:mi>
</mml:math>
</inline-formula> and <inline-formula id="inf32">
<mml:math id="m38">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:munder accentunder="true">
<mml:mi>P</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf33">
<mml:math id="m39">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>P</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the lower and upper limits of output power.</p>
<p>5) Ramp power constraints of units:<disp-formula id="e7">
<mml:math id="m40">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:munder accentunder="true">
<mml:mi>R</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2264;</mml:mo>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2264;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>R</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(7)</label>
</disp-formula>where <inline-formula id="inf34">
<mml:math id="m41">
<mml:mrow>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the ramp power of unit i at time t and <inline-formula id="inf35">
<mml:math id="m42">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:munder accentunder="true">
<mml:mi>R</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf36">
<mml:math id="m43">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>R</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are, respectively, the lower and upper limits of ramp power.</p>
<p>6) Tie-line power deviation constraints:<disp-formula id="e8">
<mml:math id="m44">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:munder accentunder="true">
<mml:mi>P</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:msub>
<mml:mo>&#x2264;</mml:mo>
<mml:mo>&#x394;</mml:mo>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2264;</mml:mo>
<mml:mo>&#x394;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>P</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(8)</label>
</disp-formula>where <inline-formula id="inf37">
<mml:math id="m45">
<mml:mrow>
<mml:mo>&#x394;</mml:mo>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is tie-line power deviation at time t and <inline-formula id="inf38">
<mml:math id="m46">
<mml:mrow>
<mml:mo>&#x394;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:munder accentunder="true">
<mml:mi>P</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf39">
<mml:math id="m47">
<mml:mrow>
<mml:mo>&#x394;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>P</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are, respectively, the lower and upper limits of tie-line power deviation.</p>
<p>7) Frequency deviation constraints:<disp-formula id="e9">
<mml:math id="m48">
<mml:mrow>
<mml:mo>&#x394;</mml:mo>
<mml:munder accentunder="true">
<mml:mi>f</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
<mml:mo>&#x2264;</mml:mo>
<mml:mo>&#x394;</mml:mo>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2264;</mml:mo>
<mml:mo>&#x394;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>f</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(9)</label>
</disp-formula>where <inline-formula id="inf40">
<mml:math id="m49">
<mml:mrow>
<mml:mo>&#x394;</mml:mo>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the frequency deviation at time t and <inline-formula id="inf41">
<mml:math id="m50">
<mml:mrow>
<mml:mo>&#x394;</mml:mo>
<mml:munder accentunder="true">
<mml:mi>f</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf42">
<mml:math id="m51">
<mml:mrow>
<mml:mo>&#x394;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>f</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> are, respectively, the lower and upper limits of frequency deviation.</p>
</sec>
<sec id="s3">
<title>Proximal Policy Optimization Algorithm</title>
<sec id="s3-1">
<title>The Framework of Reinforcement Learning</title>
<p>A reinforcement learning framework includes an agent and an environment, as illustrated in <xref ref-type="fig" rid="F1">Figure 1</xref>, which aims at maximizing a long-term reward through abundant interactions between the agent and the environment. At each step t, the agent observes states <inline-formula id="inf43">
<mml:math id="m52">
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and executes action <inline-formula id="inf44">
<mml:math id="m53">
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>; based on its observation and policy, the environment receives action <inline-formula id="inf45">
<mml:math id="m54">
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, then emits states <inline-formula id="inf46">
<mml:math id="m55">
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, and issues a reward <inline-formula id="inf47">
<mml:math id="m56">
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> to the agent. Compared with supervised learning, the actions of RL are not labeled, that is, the agent does not know what the correct action is during training and can only be trained through the trial and error approach to explore the environment and maximize its reward.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Environment&#x2013;DRL agent interaction loop of reinforcement learning.</p>
</caption>
<graphic xlink:href="fenrg-10-947532-g001.tif"/>
</fig>
<p>The interaction between the agent and environment can be modeled by a Markov decision process, which is a standard mathematical formalism of sequential decision problems. A typical Markov decision is denoted by a tuple <inline-formula id="inf48">
<mml:math id="m57">
<mml:mrow>
<mml:mo>&#x3c;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>P</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>R</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>&#x3b3;</mml:mi>
<mml:mo>&#x3e;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>, where <inline-formula id="inf49">
<mml:math id="m58">
<mml:mi>S</mml:mi>
</mml:math>
</inline-formula> is the state space, and it is the complete description of the environment which is represented by a real-valued vector, matrix, or higher-order tensor. <inline-formula id="inf50">
<mml:math id="m59">
<mml:mi>A</mml:mi>
</mml:math>
</inline-formula> is often called the action space that is also represented by a real-valued vector matrix or higher-order tensor, whereas different environments allow different kinds of actions, that is, discrete and continuous action spaces. <inline-formula id="inf51">
<mml:math id="m60">
<mml:mi>P</mml:mi>
</mml:math>
</inline-formula> is the transition probability function, and <inline-formula id="inf52">
<mml:math id="m61">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mi>s</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the probability of transitioning into state <inline-formula id="inf53">
<mml:math id="m62">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> by taking action <inline-formula id="inf54">
<mml:math id="m63">
<mml:mi>a</mml:mi>
</mml:math>
</inline-formula> on state <inline-formula id="inf55">
<mml:math id="m64">
<mml:mi>s</mml:mi>
</mml:math>
</inline-formula>. <inline-formula id="inf56">
<mml:math id="m65">
<mml:mi>R</mml:mi>
</mml:math>
</inline-formula> is the reward function, and <inline-formula id="inf57">
<mml:math id="m66">
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the probability of receiving a reward <inline-formula id="inf58">
<mml:math id="m67">
<mml:mi>r</mml:mi>
</mml:math>
</inline-formula> from action <inline-formula id="inf59">
<mml:math id="m68">
<mml:mi>a</mml:mi>
</mml:math>
</inline-formula> and state <inline-formula id="inf60">
<mml:math id="m69">
<mml:mi>s</mml:mi>
</mml:math>
</inline-formula>. <inline-formula id="inf61">
<mml:math id="m70">
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the reward discount factor. The agent learns to find a policy to maximize the total discounted reward as presented in <xref ref-type="disp-formula" rid="e11">(11)</xref>, and <inline-formula id="inf62">
<mml:math id="m71">
<mml:mi>T</mml:mi>
</mml:math>
</inline-formula> is the number of time steps in each episode.<disp-formula id="e10">
<mml:math id="m72">
<mml:mrow>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3b3;</mml:mi>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mo>&#x22ef;</mml:mo>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mi>&#x3b3;</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(10)</label>
</disp-formula>
</p>
<p>The policy is a rule used by an agent to decide what actions to take, which maps the action from a given state. A stochastic policy is usually expressed as <inline-formula id="inf63">
<mml:math id="m73">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c0;</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mtext>&#x7c;</mml:mtext>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, in which parameter <inline-formula id="inf64">
<mml:math id="m74">
<mml:mi>&#x3b8;</mml:mi>
</mml:math>
</inline-formula> denotes the weights and biases of the neural network in deep reinforcement learning algorithms.</p>
<p>The state value functions <inline-formula id="inf65">
<mml:math id="m75">
<mml:mrow>
<mml:msup>
<mml:mi>V</mml:mi>
<mml:mi>&#x3c0;</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the expected return starting from state <inline-formula id="inf66">
<mml:math id="m76">
<mml:mi>s</mml:mi>
</mml:math>
</inline-formula> following a certain policy as defined in <xref ref-type="disp-formula" rid="e12">(12)</xref>, which is used to evaluate the state:<disp-formula id="e11">
<mml:math id="m77">
<mml:mrow>
<mml:msup>
<mml:mi>V</mml:mi>
<mml:mi>&#x3c0;</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x395;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(11)</label>
</disp-formula>
</p>
<p>The action-value function <inline-formula id="inf67">
<mml:math id="m78">
<mml:mrow>
<mml:msup>
<mml:mi>Q</mml:mi>
<mml:mi>&#x3c0;</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the expected return starting from state <inline-formula id="inf68">
<mml:math id="m79">
<mml:mi>s</mml:mi>
</mml:math>
</inline-formula>, taking action <inline-formula id="inf69">
<mml:math id="m80">
<mml:mi>a</mml:mi>
</mml:math>
</inline-formula>, and then following policy <inline-formula id="inf70">
<mml:math id="m81">
<mml:mi>&#x3c0;</mml:mi>
</mml:math>
</inline-formula>, denoted as (13), which is utilized to evaluate the action:<disp-formula id="e12">
<mml:math id="m82">
<mml:mrow>
<mml:msup>
<mml:mi>Q</mml:mi>
<mml:mi>&#x3c0;</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x395;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(12)</label>
</disp-formula>
</p>
<p>The advantage function <inline-formula id="inf71">
<mml:math id="m83">
<mml:mrow>
<mml:msup>
<mml:mi>A</mml:mi>
<mml:mi>&#x3c0;</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> corresponding to policy <inline-formula id="inf72">
<mml:math id="m84">
<mml:mi>&#x3c0;</mml:mi>
</mml:math>
</inline-formula> measures the importance of each action in this state, which is mathematically defined as shown in <xref ref-type="disp-formula" rid="e14">(14)</xref>:<disp-formula id="e13">
<mml:math id="m85">
<mml:mrow>
<mml:msup>
<mml:mi>A</mml:mi>
<mml:mi>&#x3c0;</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mi>Q</mml:mi>
<mml:mi>&#x3c0;</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mi>V</mml:mi>
<mml:mi>&#x3c0;</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(13)</label>
</disp-formula>
</p>
</sec>
<sec id="s3-2">
<title>Proximal Policy Optimization Algorithm With Importance Sampling and Clipping Mechanism</title>
<p>In general, the DRL algorithms can be divided into the value-based, the policy-based, and the actor-to-critic (A2C) framework. The proximal policy optimization (PPO) algorithm follows the A2C framework with an actor network and a critic network.</p>
<p>The main advantage of applying PPO algorithm to the AGC optimization problem is that the new control action decision updates from the policy network does not change too much from the previous policy and can be restrained within the feasible region by the clipping mechanism. During the off-line training process, the PPO also converges faster than other DRL algorithms. Also, during the on-line operations, the PPO generates smoother, less variance, and more predictable sequential decisions, which is desired for the AGC optimization.</p>
<p>The overall structure of the PPO algorithm is presented in <xref ref-type="fig" rid="F2">Figure 2</xref>, including an actor network and a critic network. The AGC training environment sends the experience tuples <inline-formula id="inf73">
<mml:math id="m86">
<mml:mrow>
<mml:mo>&#x3c;</mml:mo>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3e;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> to the trajectory memory pool to form finite mini-batches of samples and returns to the PPO algorithm.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Structure of PPO algorithm.</p>
</caption>
<graphic xlink:href="fenrg-10-947532-g002.tif"/>
</fig>
<p>The actor network and the critic network are realized by deep neural networks (DNNs) with the following equations:<disp-formula id="e14">
<mml:math id="m87">
<mml:mrow>
<mml:msub>
<mml:mi>O</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:msub>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>y</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(14)</label>
</disp-formula>
<disp-formula id="e15">
<mml:math id="m88">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mi>D</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>y</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mo>&#x2026;</mml:mo>
<mml:msub>
<mml:mi>O</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>O</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(15)</label>
</disp-formula>where <inline-formula id="inf74">
<mml:math id="m89">
<mml:mrow>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf75">
<mml:math id="m90">
<mml:mrow>
<mml:msub>
<mml:mi>O</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> represent the input array and output array of the <inline-formula id="inf76">
<mml:math id="m91">
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> layer of the DNN. The layers are connected as (16), <inline-formula id="inf77">
<mml:math id="m92">
<mml:mrow>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>O</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. <inline-formula id="inf78">
<mml:math id="m93">
<mml:mrow>
<mml:msub>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>y</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the number of total layers and <inline-formula id="inf79">
<mml:math id="m94">
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf80">
<mml:math id="m95">
<mml:mrow>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the weights and bias matrices of the <inline-formula id="inf81">
<mml:math id="m96">
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> layer. The ReLU functions are used as the activation function <inline-formula id="inf82">
<mml:math id="m97">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mo>&#x22c5;</mml:mo>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>The actor network contains the policy model <inline-formula id="inf83">
<mml:math id="m98">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c0;</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> with network parameters <inline-formula id="inf84">
<mml:math id="m99">
<mml:mrow>
<mml:mo>&#x3c;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3e;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>. It is responsible for the sequential decisions of AGC optimizations. The rewards are taken in by the critic network <inline-formula id="inf85">
<mml:math id="m100">
<mml:mrow>
<mml:msub>
<mml:mi>V</mml:mi>
<mml:mi>&#x3bc;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> with parameters <inline-formula id="inf86">
<mml:math id="m101">
<mml:mrow>
<mml:mo>&#x3c;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>&#x3bc;</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>&#x3bc;</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3e;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>, which is a value function and maps the state <inline-formula id="inf87">
<mml:math id="m102">
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> to the expected future cumulative rewards.</p>
<p>The conventional policy gradient-based DRL optimizes the following objective function (<xref ref-type="bibr" rid="B26">Wang et al., 2020</xref>):<disp-formula id="e16">
<mml:math id="m103">
<mml:mrow>
<mml:msup>
<mml:mi>L</mml:mi>
<mml:mi>P</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>E</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>g</mml:mi>
<mml:msub>
<mml:mi>&#x3c0;</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>A</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(16)</label>
</disp-formula>where <inline-formula id="inf88">
<mml:math id="m104">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>E</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mo>&#x22c5;</mml:mo>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the empirical average over a finite mini-batch of samples, <inline-formula id="inf89">
<mml:math id="m105">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c0;</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is a stochastic policy, and <inline-formula id="inf90">
<mml:math id="m106">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>A</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is an estimator of the advantage function at time <inline-formula id="inf91">
<mml:math id="m107">
<mml:mi>t</mml:mi>
</mml:math>
</inline-formula>. In this work, a generalized advantage estimator (GAE) is used to compute the advantage function, which is the discounted sum of temporal difference errors (<xref ref-type="bibr" rid="B20">Schulman et al., 2017</xref>).<disp-formula id="e17">
<mml:math id="m108">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>A</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3b4;</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msubsup>
<mml:mi>&#x3b4;</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>V</mml:mi>
</mml:msubsup>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:msubsup>
<mml:mi>&#x3b4;</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mi>V</mml:mi>
</mml:msubsup>
<mml:mo>&#x2b;</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>U</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:msubsup>
<mml:mi>&#x3b4;</mml:mi>
<mml:mrow>
<mml:mi>U</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>V</mml:mi>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
</mml:mrow>
</mml:math>
<label>(17)</label>
</disp-formula>
<disp-formula id="e18">
<mml:math id="m109">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b4;</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3b3;</mml:mi>
<mml:msub>
<mml:mi>V</mml:mi>
<mml:mi>&#x3bc;</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>V</mml:mi>
<mml:mi>&#x3bc;</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(18)</label>
</disp-formula>where <inline-formula id="inf92">
<mml:math id="m110">
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the discount factor, <inline-formula id="inf93">
<mml:math id="m111">
<mml:mrow>
<mml:mi>&#x3bb;</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the GAE parameter, <inline-formula id="inf94">
<mml:math id="m112">
<mml:mi>U</mml:mi>
</mml:math>
</inline-formula> is the length of the sampled batch, and <inline-formula id="inf95">
<mml:math id="m113">
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the reward at time <inline-formula id="inf96">
<mml:math id="m114">
<mml:mi>t</mml:mi>
</mml:math>
</inline-formula>. The objective function <inline-formula id="inf97">
<mml:math id="m115">
<mml:mrow>
<mml:msup>
<mml:mi>L</mml:mi>
<mml:mi>V</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mo>&#x22c5;</mml:mo>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> can be formulated as:<disp-formula id="e19">
<mml:math id="m116">
<mml:mrow>
<mml:msup>
<mml:mi>L</mml:mi>
<mml:mi>V</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3bc;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>E</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>L</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>V</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3bc;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>E</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>V</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>&#x3bc;</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>V</mml:mi>
<mml:mi>&#x3bc;</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(19)</label>
</disp-formula>
<disp-formula id="e20">
<mml:math id="m117">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>V</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>&#x3bc;</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3b3;</mml:mi>
<mml:msub>
<mml:mi>V</mml:mi>
<mml:mi>&#x3bc;</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(20)</label>
</disp-formula>where <inline-formula id="inf98">
<mml:math id="m118">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>V</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>&#x3bc;</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mo>&#x22c5;</mml:mo>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the target value of time-difference (TD) error. The parameters of the critic network <inline-formula id="inf99">
<mml:math id="m119">
<mml:mrow>
<mml:msub>
<mml:mi>V</mml:mi>
<mml:mi>&#x3bc;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> can be updated by the stochastic gradient descent algorithm in <xref ref-type="bibr" rid="B9">Duan et al. (2020</xref>) according to the gradient <inline-formula id="inf100">
<mml:math id="m120">
<mml:mrow>
<mml:mo>&#x2207;</mml:mo>
<mml:msup>
<mml:mi>L</mml:mi>
<mml:mi>V</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3bc;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> with a learning rate <inline-formula id="inf101">
<mml:math id="m121">
<mml:mi>&#x3b7;</mml:mi>
</mml:math>
</inline-formula>.</p>
<p>The input of the actor network is the observation state <inline-formula id="inf102">
<mml:math id="m122">
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, and the outputs are the normal distribution mean value and standard deviation of the actions, that is, the strategy distribution <inline-formula id="inf103">
<mml:math id="m123">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c0;</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. The importance sampling is used to obtain the expectation of samples gathered from an old policy <inline-formula id="inf104">
<mml:math id="m124">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c0;</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> under the new policy <inline-formula id="inf105">
<mml:math id="m125">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c0;</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. This process converts the PPO algorithm from an on-policy method to an off-policy method, which means that the actor network is updated asynchronously to further stabilize the performance of AGC actions. The following surrogate object function is being maximized:<disp-formula id="e21">
<mml:math id="m126">
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:mo>&#xa0;</mml:mo>
<mml:msup>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:mi>P</mml:mi>
<mml:mi>I</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>E</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c0;</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>/</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mi>&#x3c0;</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>A</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>E</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>A</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(21)</label>
</disp-formula>
<disp-formula id="e22">
<mml:math id="m127">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>.</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>.</mml:mo>
<mml:mtext>&#x2003;</mml:mtext>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>E</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mi>K</mml:mi>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c0;</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>&#x3c0;</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>&#x3be;</mml:mi>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(22)</label>
</disp-formula>where <inline-formula id="inf106">
<mml:math id="m128">
<mml:mrow>
<mml:mi>K</mml:mi>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mo>&#x22c5;</mml:mo>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the Kullback&#x2013;Leibler (KL) divergence, <inline-formula id="inf107">
<mml:math id="m129">
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3c0;</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mtext>&#x7c;</mml:mtext>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:msub>
<mml:mi>&#x3c0;</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mtext>&#x7c;</mml:mtext>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> denotes the ratio of the probability of action <inline-formula id="inf108">
<mml:math id="m130">
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> under the new and old policies, and <inline-formula id="inf109">
<mml:math id="m131">
<mml:mi>&#x3be;</mml:mi>
</mml:math>
</inline-formula> is a small number. In order to simplify the penalty by the KL divergence to a first-order algorithm and attain the data efficiency and robustness, a clipping mechanism, <inline-formula id="inf110">
<mml:math id="m132">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi mathvariant="normal">&#x3f5;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="normal">&#x3f5;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>A</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, is introduced to modify the surrogate objective by clipping <inline-formula id="inf111">
<mml:math id="m133">
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. It removes the incentive of moving <inline-formula id="inf112">
<mml:math id="m134">
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> outside of the interval <inline-formula id="inf113">
<mml:math id="m135">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi mathvariant="normal">&#x3f5;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="normal">&#x3f5;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. The objective function with the <inline-formula id="inf114">
<mml:math id="m136">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mo>&#x22c5;</mml:mo>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> function is defined as,<disp-formula id="e23">
<mml:math id="m137">
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:msup>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:mi>L</mml:mi>
<mml:mi>I</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>E</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>L</mml:mi>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:mi>L</mml:mi>
<mml:mi>I</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3bc;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>E</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mi>min</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>A</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi mathvariant="normal">&#x3f5;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="normal">&#x3f5;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>A</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(23)</label>
</disp-formula>
</p>
<p>The importance sampling and clipping function help the PPO DRL algorithm achieve better stability and reliability for AGC online operations, better data efficiency and computation efficiency, and better overall performance.</p>
</sec>
</sec>
<sec id="s4">
<title>AGC Optimization Strategy Based on Reinforcement Learning</title>
<p>If the regulation power of each AGC unit is regarded as the action of the agent and the real power system is regarded as the environment of the agent, then the AGC dynamic optimization model considering the uncertainty of wind power can be transformed into a typical random sequential decision problem. Combining the description of the aforementioned AGC dynamic optimization mathematical model, the 15-min control cycle can be divided into a 15-stage Markov process. The framework is shown in <xref ref-type="fig" rid="F3">Figure 3</xref>.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Framework of grid environment interacting with an agent.</p>
</caption>
<graphic xlink:href="fenrg-10-947532-g003.tif"/>
</fig>
<p>The agent can be trained offline through historical data and massive simulations and then applied online in the real power grid. This section mainly focuses on the efficient offline training process of such agent, which introduces the design of several important components.</p>
<sec id="s4-1">
<title>State and Action Spaces</title>
<p>State space S: the setting of the state space should consider the factors that may affect the decision as much as possible. In this work, the state space is determined as a vector of system information representing the current system condition at time t and prediction system information at time <inline-formula id="inf115">
<mml:math id="m138">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>. Specifically, the former includes the real power output of all units (AGC and non-AGC unit) <inline-formula id="inf116">
<mml:math id="m139">
<mml:mrow>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mi>r</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, the frequency deviation <inline-formula id="inf117">
<mml:math id="m140">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, the power deviation of the tie-line <inline-formula id="inf118">
<mml:math id="m141">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mi>r</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, and the area control error <inline-formula id="inf119">
<mml:math id="m142">
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>C</mml:mi>
<mml:msubsup>
<mml:mi>E</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>. The latter includes the prediction system information of load <inline-formula id="inf120">
<mml:math id="m143">
<mml:mrow>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>f</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, wind power <inline-formula id="inf121">
<mml:math id="m144">
<mml:mrow>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>f</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, frequency deviation <inline-formula id="inf122">
<mml:math id="m145">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>f</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, power deviation of tie-line <inline-formula id="inf123">
<mml:math id="m146">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mrow>
<mml:mi>f</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, and area control error <inline-formula id="inf124">
<mml:math id="m147">
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>C</mml:mi>
<mml:msubsup>
<mml:mi>E</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>f</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>. It is set as follows:<disp-formula id="e24">
<mml:math id="m148">
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mo>:</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mi>r</mml:mi>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mi>r</mml:mi>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mi>A</mml:mi>
<mml:mi>C</mml:mi>
<mml:msubsup>
<mml:mi>E</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>f</mml:mi>
</mml:msubsup>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>f</mml:mi>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>f</mml:mi>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mrow>
<mml:mi>f</mml:mi>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mi>A</mml:mi>
<mml:mi>C</mml:mi>
<mml:msubsup>
<mml:mi>E</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>f</mml:mi>
</mml:msubsup>
</mml:mtd>
</mml:mtr>
</mml:mtable>
<mml:mo>}</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(24)</label>
</disp-formula>
</p>
<p>Action space A: action space is the decision variable in the optimization model, including the ramp direction and ramp power. In this article, to avoid the lack of generality, the action is defined as power increments of AGC units <inline-formula id="inf125">
<mml:math id="m149">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mi>a</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> at each optimization time, which are subjected to the ramp power limits of corresponding AGC units. <inline-formula id="inf126">
<mml:math id="m150">
<mml:mi>A</mml:mi>
</mml:math>
</inline-formula> is set as<disp-formula id="e25">
<mml:math id="m151">
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>:</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mi>a</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(25)</label>
</disp-formula>
</p>
</sec>
<sec id="s4-2">
<title>Reward Function Design</title>
<p>The design of reward function is crucial in DRL. It generates reward <inline-formula id="inf127">
<mml:math id="m152">
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> at time <inline-formula id="inf128">
<mml:math id="m153">
<mml:mi>t</mml:mi>
</mml:math>
</inline-formula> in each decision cycle, which evaluates the agent&#x2019;s actions based on the AGC control performance under the impacts of uncertainties in the system variables. In this work, the values of load, wind generations, frequency deviations, the tie-line power deviations, and ACE are used to formulate the reward function, which consists of cost terms, punishment terms, and performance terms. The reward <inline-formula id="inf129">
<mml:math id="m154">
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is calculated by the formula:<disp-formula id="e26">
<mml:math id="m155">
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>F</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(26)</label>
</disp-formula>
</p>
<p>where the cost term <inline-formula id="inf130">
<mml:math id="m156">
<mml:mrow>
<mml:msub>
<mml:mi>F</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> represents the total cost of the system. It includes AGC adjustment ancillary service cost and the load shedding cost, which are calculated as follows:<disp-formula id="e27">
<mml:math id="m157">
<mml:mrow>
<mml:msub>
<mml:mi>F</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mi>C</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msubsup>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(27)</label>
</disp-formula>where <inline-formula id="inf131">
<mml:math id="m158">
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf132">
<mml:math id="m159">
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the corresponding cost coefficients. The load shedding <inline-formula id="inf133">
<mml:math id="m160">
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>,</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> must be set reasonably as follows:<disp-formula id="e28">
<mml:math id="m161">
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mn>0.2</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>0.1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#x3e;</mml:mo>
<mml:mn>0.2.</mml:mn>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(28)</label>
</disp-formula>
</p>
<p>Here, the real power deviations <inline-formula id="inf134">
<mml:math id="m162">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are utilized to reflect the stochastic process caused by the load and wind power fluctuations. At time <inline-formula id="inf135">
<mml:math id="m163">
<mml:mi>t</mml:mi>
</mml:math>
</inline-formula>, the power deviations in the system are calculated as follows:<disp-formula id="e29">
<mml:math id="m164">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:munderover>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mi>r</mml:mi>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mi>r</mml:mi>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mi>r</mml:mi>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mi>r</mml:mi>
</mml:msubsup>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(29)</label>
</disp-formula>where <inline-formula id="inf136">
<mml:math id="m165">
<mml:mi>N</mml:mi>
</mml:math>
</inline-formula> is the total number of thermal power units in the system, including AGC and non-AGC units, <inline-formula id="inf137">
<mml:math id="m166">
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the power output of thermal power unit <inline-formula id="inf138">
<mml:math id="m167">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula> at <inline-formula id="inf139">
<mml:math id="m168">
<mml:mi>t</mml:mi>
</mml:math>
</inline-formula> period, and <inline-formula id="inf140">
<mml:math id="m169">
<mml:mrow>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mi>r</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf141">
<mml:math id="m170">
<mml:mrow>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mi>r</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf142">
<mml:math id="m171">
<mml:mrow>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mi>r</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> are the power output of wind power, load, and tie-line power. Note that the power flowing out of the system is taken to be positive. <inline-formula id="inf143">
<mml:math id="m172">
<mml:mrow>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mi>r</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> is the system loss at time <inline-formula id="inf144">
<mml:math id="m173">
<mml:mi>t</mml:mi>
</mml:math>
</inline-formula>.</p>
<p>The power output of any unit (AGC or non-AGC unit) relates to the frequency deviation, tie-line power deviation, and ACE. Taking an interconnection power grid of two areas as an example, the system contains region <italic>A</italic> and region <italic>B</italic>. The control strategy of the two areas is the tie-line bias frequency control. It is assumed that <inline-formula id="inf145">
<mml:math id="m174">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf146">
<mml:math id="m175">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mi>B</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf147">
<mml:math id="m176">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf148">
<mml:math id="m177">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mi>B</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the change of load and the change of power output of units in region <italic>A</italic> and region <italic>B</italic> at time <inline-formula id="inf149">
<mml:math id="m178">
<mml:mi>t</mml:mi>
</mml:math>
</inline-formula>, respectively, and <inline-formula id="inf150">
<mml:math id="m179">
<mml:mrow>
<mml:msub>
<mml:mi>K</mml:mi>
<mml:mi>A</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf151">
<mml:math id="m180">
<mml:mrow>
<mml:msub>
<mml:mi>K</mml:mi>
<mml:mi>A</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the frequency regulation constants of region <italic>A</italic> and region <italic>B</italic>. We define <inline-formula id="inf152">
<mml:math id="m181">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf153">
<mml:math id="m182">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>B</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> as power imbalance of the two areas which can be calculated as follows:<disp-formula id="e30">
<mml:math id="m183">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(30)</label>
</disp-formula>
<disp-formula id="e31">
<mml:math id="m184">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>B</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mi>B</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mi>B</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(31)</label>
</disp-formula>
</p>
<p>Frequency deviation, tie-line power deviation, and area control error can be calculated as follows:<disp-formula id="e32">
<mml:math id="m185">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>B</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>K</mml:mi>
<mml:mi>A</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>K</mml:mi>
<mml:mi>B</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(32)</label>
</disp-formula>
<disp-formula id="e33">
<mml:math id="m186">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>K</mml:mi>
<mml:mi>A</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>B</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>K</mml:mi>
<mml:mi>B</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>K</mml:mi>
<mml:mi>A</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>K</mml:mi>
<mml:mi>B</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(33)</label>
</disp-formula>
<disp-formula id="e34">
<mml:math id="m187">
<mml:mrow>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>C</mml:mi>
<mml:mi>E</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>10</mml:mn>
<mml:mi>B</mml:mi>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(34)</label>
</disp-formula>where <italic>B</italic> is the equivalent frequency regulation constant for the control area in MW/0.1Hz and the value is negative.</p>
<p>The punishment term <inline-formula id="inf154">
<mml:math id="m188">
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> formulates the operation and control limits in AGC dynamic optimization, including generation unit power output limits, CPS1, frequency deviation limits, and tie-line power deviation limits and given as:<disp-formula id="e35">
<mml:math id="m189">
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mn>3</mml:mn>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mn>4</mml:mn>
</mml:msub>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(35)</label>
</disp-formula>
</p>
<p>The AGC units participate in both primary and secondary frequency control; thus, the outputs of AGC units at time <inline-formula id="inf155">
<mml:math id="m190">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> are calculated as<disp-formula id="e36">
<mml:math id="m191">
<mml:mrow>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mi>a</mml:mi>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>r</mml:mi>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(36)</label>
</disp-formula>where <inline-formula id="inf156">
<mml:math id="m192">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mi>a</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> is the regulated power of AGC unit <inline-formula id="inf157">
<mml:math id="m193">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula> at time <inline-formula id="inf158">
<mml:math id="m194">
<mml:mi>t</mml:mi>
</mml:math>
</inline-formula>, that is, the power increment of secondary frequency control; and <inline-formula id="inf159">
<mml:math id="m195">
<mml:mrow>
<mml:msub>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>r</mml:mi>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the primary frequency control power of AGC unit <inline-formula id="inf160">
<mml:math id="m196">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula>, where <inline-formula id="inf161">
<mml:math id="m197">
<mml:mrow>
<mml:msub>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the frequency regulation constant of unit <inline-formula id="inf162">
<mml:math id="m198">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula>, and <inline-formula id="inf163">
<mml:math id="m199">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>r</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf164">
<mml:math id="m200">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> are system frequency deviations at time <inline-formula id="inf165">
<mml:math id="m201">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf166">
<mml:math id="m202">
<mml:mi>t</mml:mi>
</mml:math>
</inline-formula>, respectively.</p>
<p>Accordingly, the power outputs of non-AGC units at time <inline-formula id="inf167">
<mml:math id="m203">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> are calculated as<disp-formula id="e37">
<mml:math id="m204">
<mml:mrow>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>r</mml:mi>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msubsup>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(37)</label>
</disp-formula>
</p>
<p>The outputs of AGC and non-AGC units are subjected to the corresponding maximum and minimum power limits:<disp-formula id="e38">
<mml:math id="m205">
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>m</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:msubsup>
<mml:mo>&#x3c;</mml:mo>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>m</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:msubsup>
<mml:mo>&#x3e;</mml:mo>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>max</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:msubsup>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mtext>else,</mml:mtext>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(38)</label>
</disp-formula>
<disp-formula id="e39">
<mml:math id="m206">
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">m</mml:mi>
<mml:mi mathvariant="normal">i</mml:mi>
<mml:mi mathvariant="normal">n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mtext>&#x3c;</mml:mtext>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3c;</mml:mo>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">m</mml:mi>
<mml:mi mathvariant="normal">a</mml:mi>
<mml:mi mathvariant="normal">x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mtext>else,</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(39)</label>
</disp-formula>where <inline-formula id="inf168">
<mml:math id="m207">
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the punishment coefficient. The CPS1-related punishment term is formulated as,<disp-formula id="e40">
<mml:math id="m208">
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2265;</mml:mo>
<mml:mn>200</mml:mn>
<mml:mo>%</mml:mo>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>C</mml:mi>
<mml:mi>E</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>C</mml:mi>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mo>&#x2217;</mml:mo>
</mml:msubsup>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mn>100</mml:mn>
<mml:mo>%</mml:mo>
<mml:mo>&#x2264;</mml:mo>
<mml:msub>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2264;</mml:mo>
<mml:mn>200</mml:mn>
<mml:mo>%</mml:mo>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mn>3</mml:mn>
</mml:msub>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>&#x2217;</mml:mo>
</mml:msubsup>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3c;</mml:mo>
<mml:mn>100</mml:mn>
<mml:mo>%</mml:mo>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(40)</label>
</disp-formula>where <inline-formula id="inf169">
<mml:math id="m209">
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf170">
<mml:math id="m210">
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mn>3</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the punishment coefficients of ACE and CPS1 and <inline-formula id="inf171">
<mml:math id="m211">
<mml:mrow>
<mml:msubsup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>C</mml:mi>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mo>&#x2217;</mml:mo>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf172">
<mml:math id="m212">
<mml:mrow>
<mml:msubsup>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>&#x2217;</mml:mo>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> are, respectively, ideal values of ACE and CPS1. In this article, the ideal values of ACE and CPS1 are 0 and 200%.</p>
<p>The following two functions denote the frequency deviation and tie-line power transfer deviation punishments, respectively:<disp-formula id="e41">
<mml:math id="m213">
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mn>3</mml:mn>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mi>min</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2264;</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:mi>f</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mn>4</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mtext>else,</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(41)</label>
</disp-formula>
<disp-formula id="e42">
<mml:math id="m214">
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mn>4</mml:mn>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>min</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2264;</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2264;</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>max</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mn>5</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mtext>else,</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(42)</label>
</disp-formula>where <inline-formula id="inf173">
<mml:math id="m215">
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mn>4</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf174">
<mml:math id="m216">
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mn>5</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the corresponding punishment coefficients.</p>
<p>In this article, we added an additional performance evaluation term <inline-formula id="inf175">
<mml:math id="m217">
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> with a coefficent <inline-formula id="inf176">
<mml:math id="m218">
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>3</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> to the reward function which makes the PPO-based DRL algorithm has the capability of further improving the long-term AGC performance:<disp-formula id="e43">
<mml:math id="m219">
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>3</mml:mn>
</mml:msub>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(43)</label>
</disp-formula>
</p>
</sec>
<sec id="s4-3">
<title>Other Parameter Setting</title>
<p>State transition probability <italic>P</italic>: in this work, the reinforcement learning algorithm based on the model-free method is utilized, so the state of the agent at the next time and rewards can be obtained by the interaction with the environment, and they make up state transition probability <italic>P</italic> including environmental stochasticity.</p>
<p>Discount factor <inline-formula id="inf177">
<mml:math id="m220">
<mml:mi>&#x3b3;</mml:mi>
</mml:math>
</inline-formula> (<inline-formula id="inf178">
<mml:math id="m221">
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
<mml:mi mathvariant="italic">&#x3f5;</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>) determines the importance of rewards in future to current reward. When <inline-formula id="inf179">
<mml:math id="m222">
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, it means that the impact of current decisions on the future system operating status is not considered, and only the operating cost of the current control period is optimized; when <inline-formula id="inf180">
<mml:math id="m223">
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, it means that the impact of current decisions on the operating status of the system at every moment in the future is equally considered. For AGC dynamic optimization control, the decision at the current moment will have an important impact on the future operating state of the system, and the closer the distance to the current decision period, the greater the impact.</p>
</sec>
<sec id="s4-4">
<title>Detailed PPO Training Algorithm in Solving the AGC Dynamic Optimization Problem</title>
<p>Based on the aforementioned analysis, this article transforms the AGC dynamic optimization problem into a sequential decision issue and utilizes the PPO deep reinforcement learning algorithm to solve the proposed problem. The AGC dynamic optimization problem based on the PPO algorithm is shown in <xref ref-type="fig" rid="F4">Figure 4</xref>. The specific process is described as follows:<list list-type="simple">
<list-item>
<p>1) Initialize the weight and bias of the neural network, actor and critic neural network learning rate, reward discount factor <inline-formula id="inf181">
<mml:math id="m224">
<mml:mi>&#x3b3;</mml:mi>
</mml:math>
</inline-formula>, and hyperparameters <inline-formula id="inf182">
<mml:math id="m225">
<mml:mi>&#x3b5;</mml:mi>
</mml:math>
</inline-formula> and other parameters. Set the number of episode <italic>M</italic> and decision cycle <inline-formula id="inf183">
<mml:math id="m226">
<mml:mi>N</mml:mi>
</mml:math>
</inline-formula>.</p>
</list-item>
<list-item>
<p>2) Initialize the initial observation value at the first moment from the power system environment.</p>
</list-item>
<list-item>
<p>3) Input state observation <inline-formula id="inf184">
<mml:math id="m227">
<mml:mi>s</mml:mi>
</mml:math>
</inline-formula> into the actor neural network and get the distribution of action <inline-formula id="inf185">
<mml:math id="m228">
<mml:mi>a</mml:mi>
</mml:math>
</inline-formula>. Then, sample the distribution to get action <inline-formula id="inf186">
<mml:math id="m229">
<mml:mi>a</mml:mi>
</mml:math>
</inline-formula> by importance sampling.</p>
</list-item>
<list-item>
<p>4) Implement action <inline-formula id="inf187">
<mml:math id="m230">
<mml:mi>a</mml:mi>
</mml:math>
</inline-formula> to the environment, then calculate the reward <inline-formula id="inf188">
<mml:math id="m231">
<mml:mi>r</mml:mi>
</mml:math>
</inline-formula>, and update the environment to get the state <inline-formula id="inf189">
<mml:math id="m232">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> at the next moment, and save the current sample <inline-formula id="inf190">
<mml:math id="m233">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Update the current observation <inline-formula id="inf191">
<mml:math id="m234">
<mml:mi>s</mml:mi>
</mml:math>
</inline-formula> to the new observation <inline-formula id="inf192">
<mml:math id="m235">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</list-item>
<list-item>
<p>5) Input <inline-formula id="inf193">
<mml:math id="m236">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> into the critic network, and get the corresponding state value function <inline-formula id="inf194">
<mml:math id="m237">
<mml:mrow>
<mml:mi>V</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
<italic>.</italic> Then, calculate the discount cumulative reward <inline-formula id="inf195">
<mml:math id="m238">
<mml:mrow>
<mml:mi>Q</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> at each moment based on <xref ref-type="disp-formula" rid="e35">(35)</xref>,</p>
</list-item>
</list>
<disp-formula id="e44">
<mml:math id="m239">
<mml:mrow>
<mml:mi>Q</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3b3;</mml:mi>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mo>&#x22ef;</mml:mo>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mi>&#x3b3;</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mi>&#x3b3;</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mi>V</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(44)</label>
</disp-formula>
<list list-type="simple">
<list-item>
<p>6) Update the actor and critic neural network models according to <inline-formula id="inf196">
<mml:math id="m240">
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf197">
<mml:math id="m241">
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf198">
<mml:math id="m242">
<mml:mrow>
<mml:mi>V</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf199">
<mml:math id="m243">
<mml:mrow>
<mml:mi>Q</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> at each moment.</p>
</list-item>
<list-item>
<p>7) Repeat steps 2&#x2013;6 until the number of training episodes is equal to the set number <italic>M.</italic>
</p>
</list-item>
<list-item>
<p>8) Save the parameters of actor and critic neural networks. Utilize the trained agent on the test data.</p>
</list-item>
</list>
</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>AGC dynamic optimization problem based on the PPO algorithm.</p>
</caption>
<graphic xlink:href="fenrg-10-947532-g004.tif"/>
</fig>
</sec>
</sec>
<sec id="s5">
<title>Case Study</title>
<sec id="s5-1">
<title>Test System and Data</title>
<p>In this article, the PPO agent for AGC dynamic optimization control is tested on the modified IEEE 39 bus system model which includes three AGC units and seven non-AGC units. The tie-line is connected to bus 29, and a wind farm with 130&#xa0;MW installations is connected to bus 39. A single-line diagram of the system is shown in <xref ref-type="fig" rid="F5">Figure 5</xref>.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Modified IEEE 39 bus system.</p>
</caption>
<graphic xlink:href="fenrg-10-947532-g005.tif"/>
</fig>
<p>The forecasting and actual data of load and wind power come from New England power grid<sup>1</sup>. The basic parameters of three AGC units and test system are shown in <xref ref-type="table" rid="T1">Tables 1</xref>, <xref ref-type="table" rid="T2">2</xref>. The control period is set at 15&#xa0;min, and it is assumed that the deviation of frequency and tie-line transmission power at the initial time are 0.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Information of AGC units.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Symbol</th>
<th align="center">Quantity</th>
<th align="center">Unit 1</th>
<th align="center">Unit 2</th>
<th align="center">Unit 3</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">
<inline-formula id="inf200">
<mml:math id="m244">
<mml:mrow>
<mml:msub>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td>Bus number</td>
<td align="center">31</td>
<td align="center">38</td>
<td align="center">39</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf201">
<mml:math id="m245">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>P</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, MW</td>
<td>Rated power</td>
<td align="center">800</td>
<td align="center">860</td>
<td align="center">1,100</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf202">
<mml:math id="m246">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:munder accentunder="true">
<mml:mi>R</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf203">
<mml:math id="m247">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>R</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, MW/min</td>
<td>Limits of lower and upper ramp power</td>
<td align="center">-30 and 30</td>
<td align="center">-45 and 45</td>
<td align="center">-60 and 60</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf204">
<mml:math id="m248">
<mml:mrow>
<mml:msub>
<mml:mi>k</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, &#xa5;/(MW min)</td>
<td>Cost coefficient of frequency regulation</td>
<td align="center">0.5</td>
<td align="center">0.5</td>
<td align="center">0.25</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf205">
<mml:math id="m249">
<mml:mrow>
<mml:msub>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> (per unit)</td>
<td>Frequency regulation constant of unit</td>
<td align="center">25</td>
<td align="center">25</td>
<td align="center">25</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Information of the test system.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Symbol</th>
<th align="center">Quantity</th>
<th align="center">Value</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">
<italic>f</italic>
<sub>N</sub>/Hz</td>
<td align="left">Nominal frequency</td>
<td align="center">50</td>
</tr>
<tr>
<td align="left">
<italic>f</italic>
<sub>
<italic>0</italic>
</sub>/Hz</td>
<td align="left">Initial frequency</td>
<td align="center">50</td>
</tr>
<tr>
<td align="left">
<italic>P</italic>
<sub>
<italic>T,N</italic>
</sub>/MW</td>
<td align="left">Nominal power of tie-line</td>
<td align="center">100</td>
</tr>
<tr>
<td align="left">
<italic>P</italic>
<sub>
<italic>T,0</italic>
</sub>/MW</td>
<td align="left">Initial power of tie-line</td>
<td align="center">100</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf206">
<mml:math id="m250">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b5;</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf207">
<mml:math id="m251">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b5;</mml:mi>
<mml:mrow>
<mml:mn>15</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="left">Target bound for the 12-month RMS value of the 1-/15-minute average frequency error, in Hz</td>
<td align="center">0.04 and 0.021</td>
</tr>
<tr>
<td align="left">
<italic>B</italic> and <italic>B</italic>
<sub>
<italic>s</italic>
</sub>
</td>
<td align="left">Target bound for the 12-month RMS value of the 1-/15-min average frequency error, in MW/0.1&#xa0;Hz</td>
<td align="center">&#x2212;38 and 50</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf208">
<mml:math id="m252">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:munder accentunder="true">
<mml:mi>f</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf209">
<mml:math id="m253">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>f</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, Hz</td>
<td align="left">Limits of frequency deviation</td>
<td align="center">&#x2212;0.05 and 0.05</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf210">
<mml:math id="m254">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:munder accentunder="true">
<mml:mi>P</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf211">
<mml:math id="m255">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>P</mml:mi>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, MW</td>
<td align="left">Limits of transmission power deviation of tie-line</td>
<td align="center">&#x2212;20 and 10</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The action space refers to the regulation power of AGC units at each optimization moment, which is determined by ramp power limits of each AGC unit. The per unit action space of the three AGC units is set as follows:<disp-formula id="e45">
<mml:math id="m256">
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>0.3</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>0.3</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mi>A</mml:mi>
<mml:mn>2</mml:mn>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>0.45</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>0.45</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mi>A</mml:mi>
<mml:mn>3</mml:mn>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>0.6</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>0.6</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(45)</label>
</disp-formula>
</p>
<p>In addition, the state space dimension is 35 according to the preceding description, which includes information on 19 forecasting loads at time t&#x2b;1, actual output power of 10 units at time t, and actual and forecasting values of system frequency deviation, transmission power deviation, and ACE separately at time t and t&#x2b;1. The dimensions of state space and action space, respectively, correspond to the neural numbers of input and output layers. Therefore, this work sets up three hidden layers both in actor and critic neural networks, and the number of neurons in each layer is 64, 128, and 32, respectively. The activation function in each hidden layer is the ReLU function. A larger learning rate <inline-formula id="inf212">
<mml:math id="m257">
<mml:mi>&#x3b1;</mml:mi>
</mml:math>
</inline-formula> accelerates the convergence of the algorithm, while a smaller <inline-formula id="inf213">
<mml:math id="m258">
<mml:mi>&#x3b1;</mml:mi>
</mml:math>
</inline-formula> tends to enhance the stability. In this article, learning rate <inline-formula id="inf214">
<mml:math id="m259">
<mml:mi>&#x3b1;</mml:mi>
</mml:math>
</inline-formula> both in actor and critic networks is set to 0.0001.</p>
</sec>
<sec id="s5-2">
<title>Evaluation of the Test Results</title>
<p>Based on the preceding model and significant parameters, the PPO agent is coded using the TensorFlow framework with Python 3.7. The results of CPS1 index are shown in <xref ref-type="fig" rid="F6">Figure 6</xref>.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Results of CPS1 index in the training process using 10,000 episodes.</p>
</caption>
<graphic xlink:href="fenrg-10-947532-g006.tif"/>
</fig>
<p>The <italic>x</italic>-axis represents the number of episodes being trained, while the <italic>y</italic>-axis represents the value of CPS1 index in each episode. It can be observed that the CPS1 values of the first few hundreds of episodes are relatively low and unstable. As training episodes increase, CPS1 values are kept within a stable range around 191.3%, which fits CPS. In comparison, the deep Q learning (DQL) algorithm and the duel deep Q learning (DDQL) algorithm are also implemented. The average CPS1 values are 187.4 and 184.5%. This shows that the PPO architecture for AGC unit dynamic optimization proposed in this article can effectively learn the growing uncertainties in the power system. Once the agent is trained, it can make proper decisions based on its trained strategy combined with environmental observation data feedback. Specifically, the agent trained in this work receives data from the power system, including actual information of unit output power, frequency, tie-line transmission power, ACE, and forecasting information of load, wind power, frequency, tie-line transmission power, and ACE as its observation, and then makes decisions for the regulation power of AGC units at time t, that is, advanced control of AGC units, in order to reduce the frequency deviation at time t&#x2b;1.</p>
<p>In addition, taking a typical control period of the system as an example, <xref ref-type="fig" rid="F7">Figure 7</xref> shows the actual load, the wind power generations, and the net load by subtracting the wind generations from the actual load. The load at each bus is allotted in proportion to the load of the original IEEE-39 node system.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>Curve of load with wind power fluctuations.</p>
</caption>
<graphic xlink:href="fenrg-10-947532-g007.tif"/>
</fig>
<p>Using PI hysteresis control and PPO algorithm for frequency control in this period, the results of system frequency deviation, transmission power deviation of tie-line, and ACE are represented in <xref ref-type="fig" rid="F8">Figures 8A&#x2013;C</xref>, respectively.</p>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption>
<p>Results of the optimization method and PPO algorithm.</p>
</caption>
<graphic xlink:href="fenrg-10-947532-g008.tif"/>
</fig>
<p>It is observed in <xref ref-type="fig" rid="F8">Figure 8A</xref> that the outputs of both the PPO agent and optimization method can meet the requirements of frequency deviation (i.e., &#xb1;0.05&#xa0;Hz). Moreover, maximum frequency deviation of the system controlled by the PPO agent is 0.0175&#xa0;Hz, which is superior to -0.044&#xa0;Hz that is controlled by the optimization method. This demonstrated that the dynamic optimization strategy of AGC units based on PPO algorithm is able to mitigate the frequency fluctuation of the system efficiently by advanced control of AGC units.</p>
<p>
<xref ref-type="fig" rid="F8">Figure 8B</xref> shows the transmission power deviation of tie-line. The power deviation under the optimization method is relatively large, which contains three times the off-limit conditions owing to the AGC resources in the system that are insufficient. While under PPO agent controller, the power deviation fluctuation is smaller without over-limit time. It proves that the system operation is more stable when using the PPO agent controller optimization method. In addition, as shown in <xref ref-type="fig" rid="F8">Figure 8C</xref>, the agent performs much better than the optimization method when calculating the values of ACE. <xref ref-type="fig" rid="F9">Figure 9</xref> shows the AGC power regulation curve of all AGC units.</p>
<fig id="F9" position="float">
<label>FIGURE 9</label>
<caption>
<p>Results of AGC regulation curves.</p>
</caption>
<graphic xlink:href="fenrg-10-947532-g009.tif"/>
</fig>
</sec>
<sec id="s5-3">
<title>Convergence of Algorithm</title>
<p>In the training process, the cumulative reward of each episode is recorded. Then, the results and the filtered curve are shown in <xref ref-type="fig" rid="F10">Figure 10</xref>.</p>
<fig id="F10" position="float">
<label>FIGURE 10</label>
<caption>
<p>Performance of the PPO agent in the training process using 10,000 episodes.</p>
</caption>
<graphic xlink:href="fenrg-10-947532-g010.tif"/>
</fig>
<p>Due to the loads and wind power fluctuations being different in each episode, the needs of frequency regulation in each episode are also different. Therefore, it is normal that there exists slight oscillation of cumulative rewards for each episode. As the training process continues, the cumulative rewards tend to converge as shown in Fig.</p>
</sec>
</sec>
<sec id="s6">
<title>Conclusion and Future Works</title>
<p>To effectively mitigate frequency control issues under growing uncertainties, this article presents a novel solution, the PPO architecture for AGC dynamic optimization, which transformed the traditional optimization problem into a Markov decision process and utilized deep reinforcement learning algorithm for frequency control.</p>
<p>Through the design of state, action, and reward functions, the continuous multiple time step control can be implemented with the goal of maximizing cumulative rewards. The model utilized the way of interaction between the agent and the environment to improve the parameters, which is adaptive to the uncertainties in the environment and avoids the modeling of uncertain variables. The model proposed in this article is tested on the modified IEEE 39 bus system. The results demonstrate that the PPO architecture for AGC dynamic optimization can achieve the goal of frequency control with satisfactory performance compared to other methods. It is verified that the method proposed in this article can effectively solve the stochastic disturbance problem caused by large-scale integration of renewable energy into power grid and ensure the safety and stability of system frequency.</p>
<p>From the lessons learned in this work, the directions of future works are discussed here. First, the deep learning-based algorithms suffered from poor interpretability, which is undesired for control engineering problems. With the developments of explainable artificial intelligence, future works are needed on this direction. Second, better exploration mechanisms for DRL algorithms need to be developed to further improving the training efficiency and avoiding the local optimal solutions.</p>
</sec>
</body>
<back>
<sec id="s7" sec-type="data-availability">
<title>Data Availability Statement</title>
<p>The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.</p>
</sec>
<sec id="s8">
<title>Author Contributions</title>
<p>ZL and JL wrote the manuscript. The simulations were performed by JL and PZ. ZD and YZ provided major and minor revisions to the final version of the submitted manuscript. All of the aforementioned authors contributed to the proposed methodology.</p>
</sec>
<sec id="s9">
<title>Funding</title>
<p>This work is supported by the Fundamental Research Funds for the Central Universities under Grant No. 2021JBM027 and the National Natural Science Foundation of China under Grant No. 52107068. Both funds are supportive to open access publication fees.</p>
</sec>
<sec sec-type="COI-statement" id="s10">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s11">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<fn-group>
<fn id="fn1">
<label>1</label>
<p>
<ext-link ext-link-type="uri" xlink:href="https://www.iso-ne.com/isoexpress/web/reports">https://www.iso-ne.com/isoexpress/web/reports</ext-link>.</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Abdel-Magid</surname>
<given-names>Y. L.</given-names>
</name>
<name>
<surname>Dawoud</surname>
<given-names>M. M.</given-names>
</name>
</person-group> (<year>1996</year>). <article-title>Optimal AGC Tuning with Genetic Algorithms</article-title>. <source>Electr. Power Syst. Res.</source> <volume>38</volume> (<issue>3</issue>), <fpage>231</fpage>&#x2013;<lpage>238</lpage>. <pub-id pub-id-type="doi">10.1016/s0378-7796(96)01091-7</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Atic</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Rerkpreedapong</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Hasanovic</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Feliachi</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2003</year>). &#x201c;<article-title>NERC Compliant Decentralized Load Frequency Control Design Using Model Predictive Control[C]</article-title>,&#x201d; in <conf-name>Power Engineering Society General Meeting</conf-name> (<publisher-loc>Piscataway, NJ, USA</publisher-loc>: <publisher-name>IEEE</publisher-name>). </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Banakar</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Ooi</surname>
<given-names>B. T.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Impacts of Wind Power Minute-To-Minute Variations on Power System Operation</article-title>. <source>IEEE Trans. Power Syst.</source> <volume>23</volume> (<issue>1</issue>), <fpage>150</fpage>&#x2013;<lpage>160</lpage>. <pub-id pub-id-type="doi">10.1109/tpwrs.2007.913298</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Beaufays</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Widrow</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Abdel-Magid</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Widrow</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>1994</year>). <article-title>Application of Neural Networks to Load-Frequency Control in Power Systems</article-title>. <source>Neural Netw.</source> <volume>7</volume> (<issue>1</issue>), <fpage>183</fpage>&#x2013;<lpage>194</lpage>. <pub-id pub-id-type="doi">10.1016/0893-6080(94)90067-1</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bohn</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Miniesy</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>1972</year>). <article-title>Optimum Load-Frequency Sampled-Data Control with Randomly Varying System Disturbances</article-title>. <source>IEEE Trans. Power Apparatus Syst.</source> <volume>PAS-91</volume> (<issue>5</issue>), <fpage>1916</fpage>&#x2013;<lpage>1923</lpage>. <pub-id pub-id-type="doi">10.1109/tpas.1972.293519</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chang</surname>
<given-names>C. S.</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Wen</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>1998</year>). <article-title>Load Frequency Control Using Genetic-Algorithm Based Fuzzy Gain Scheduling of Pi Controllers</article-title>. <source>Electr. Mach. Power Syst.</source> <volume>26</volume> (<issue>1</issue>), <fpage>39</fpage>&#x2013;<lpage>52</lpage>. <pub-id pub-id-type="doi">10.1080/07313569808955806</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Concordia</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Kirchmayer</surname>
<given-names>L. K.</given-names>
</name>
</person-group> (<year>1953</year>). <article-title>Tie-Line Power and Frequency Control of Electric Power Systems</article-title>. <source>Power Apparatus Syst. Part III Trans. Am. Inst. Electr. Eng.</source> <volume>72</volume> (<issue>2</issue>), <fpage>562</fpage>&#x2013;<lpage>572</lpage>. <pub-id pub-id-type="doi">10.1109/aieepas.1953.4498667</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dahiya</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Sharma</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Naresh</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Automatic Generation Control Using Disrupted Oppositional Based Gravitational Search Algorithm Optimised Sliding Mode Controller under Deregulated Environment</article-title>. <source>IET Gener. Transm. &#x26;amp; Distrib.</source> <volume>10</volume> (<issue>16</issue>), <fpage>3995</fpage>&#x2013;<lpage>4005</lpage>. <pub-id pub-id-type="doi">10.1049/iet-gtd.2016.0175</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Duan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Diao</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>B.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Deep-Reinforcement-Learning-Based Autonomous Voltage Control for Power Grid Operations</article-title>. <source>IEEE Trans. Power Syst.</source> <volume>35</volume> (<issue>1</issue>), <fpage>814</fpage>&#x2013;<lpage>817</lpage>. <pub-id pub-id-type="doi">10.1109/TPWRS.2019.2941134</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Elgerd</surname>
<given-names>O. I.</given-names>
</name>
<name>
<surname>Fosha</surname>
<given-names>C. E.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Optimum Megawatt-Frequency Control of Multiarea Electric Energy Systems[J]</article-title>. <source>IEEE Trans. Power Apparatus Syst.</source> <volume>PAS-89</volume> (<issue>4</issue>), <fpage>556</fpage>&#x2013;<lpage>563</lpage>. </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Erschler</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Roubellat</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Vernhes</surname>
<given-names>J. P.</given-names>
</name>
</person-group> (<year>1974</year>). <article-title>Automation of a Hydroelectric Power Station Using Variable-Structure Control Systems</article-title>. <source>Automatica</source> <volume>10</volume> (<issue>1</issue>), <fpage>31</fpage>&#x2013;<lpage>36</lpage>. <pub-id pub-id-type="doi">10.1016/0005-1098(74)90007-7</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Feliachi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Rerkpreedapong</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>NERC Compliant Load Frequency Control Design Using Fuzzy Rules</article-title>. <source>Electr. Power Syst. Res.</source> <volume>73</volume> (<issue>2</issue>), <fpage>101</fpage>&#x2013;<lpage>106</lpage>. <pub-id pub-id-type="doi">10.1016/j.epsr.2004.06.010</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jaleeli</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>VanSlyck</surname>
<given-names>L. S.</given-names>
</name>
<name>
<surname>Ewart</surname>
<given-names>D. N.</given-names>
</name>
<name>
<surname>Fink</surname>
<given-names>L. H.</given-names>
</name>
<name>
<surname>Hoffmann</surname>
<given-names>A. G.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>Understanding Automatic Generation Control[J]</article-title>. <source>IEEE Trans. Power Syst.</source> <volume>7</volume> (<issue>3</issue>), <fpage>1106</fpage>&#x2013;<lpage>1122</lpage>. <pub-id pub-id-type="doi">10.1109/59.207324</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jaleeli</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Vanslyck</surname>
<given-names>L. S.</given-names>
</name>
</person-group> (<year>1999</year>). <article-title>NERC&#x27;s New Control Performance Standards</article-title>. <source>IEEE Trans. Power Syst.</source> <volume>14</volume> (<issue>3</issue>), <fpage>1092</fpage>&#x2013;<lpage>1099</lpage>. <pub-id pub-id-type="doi">10.1109/59.780932</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khodabakhshian</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Edrisi</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>A New Robust PID Load Frequency Controller</article-title>. <source>Control Eng. Pract.</source> <volume>19</volume> (<issue>3</issue>), <fpage>1528</fpage>&#x2013;<lpage>1537</lpage>. <pub-id pub-id-type="doi">10.1016/j.conengprac.2007.12.003</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mcnamara</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Milano</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Model Predictive Control Based AGC for Multi-Terminal HVDC-Connected AC Grids[J]</article-title>. <source>IEEE Trans. Power Syst.</source> <volume>2017</volume>, <fpage>1</fpage>. </citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Olmos</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>de la Fuente</surname>
<given-names>J. I.</given-names>
</name>
<name>
<surname>Zamora Macho</surname>
<given-names>J. L.</given-names>
</name>
<name>
<surname>Pecharroman</surname>
<given-names>R. R.</given-names>
</name>
<name>
<surname>Calmarza</surname>
<given-names>A. M.</given-names>
</name>
<name>
<surname>Moreno</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>New Design for the Spanish AGC Scheme Using an Adaptive Gain Controller</article-title>. <source>IEEE Trans. Power Syst.</source> <volume>19</volume> (<issue>3</issue>), <fpage>1528</fpage>&#x2013;<lpage>1537</lpage>. <pub-id pub-id-type="doi">10.1109/tpwrs.2004.825873</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pan</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Das</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Fractional Order AGC for Distributed Energy Resources Using Robust Optimization</article-title>. <source>IEEE Trans. Smart Grid</source> <volume>7</volume> (<issue>5</issue>), <fpage>2175</fpage>&#x2013;<lpage>2186</lpage>. <pub-id pub-id-type="doi">10.1109/TSG.2015.2459766</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sahu</surname>
<given-names>B. K.</given-names>
</name>
<name>
<surname>Pati</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Mohanty</surname>
<given-names>P. K.</given-names>
</name>
<name>
<surname>Panda</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Teaching-learning Based Optimization Algorithm Based Fuzzy-PID Controller for Automatic Generation Control of Multi-Area Power System</article-title>. <source>Appl. Soft Comput.</source> <volume>27</volume>, <fpage>240</fpage>&#x2013;<lpage>249</lpage>. <pub-id pub-id-type="doi">10.1016/j.asoc.2014.11.027</pub-id> </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schulman</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wolski</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Dhariwal</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Radford</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Klimov</surname>
<given-names>O.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Proximal Policy Optimization Algorithms[J]</article-title>. <source>arXiv:1707.06347</source>. </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schulman</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Moritz</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Levine</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Jordan</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Abbeel</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>High-dimensional Continuous Control Using Generalized Advantage Estimation</article-title>. <source>arXiv Prepr. arXiv:1506.02438</source>. </citation>
</ref>
<ref id="B22">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Sun</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Analysis and Comparison of Variable Structure Fuzzy Neural Network Control and the PID Algorithm</article-title>,&#x201d; in <conf-name>2017 Chinese Automation Congress (CAC)</conf-name>, <conf-loc>Jinan</conf-loc>, <fpage>3347</fpage>&#x2013;<lpage>3350</lpage>. <pub-id pub-id-type="doi">10.1109/CAC.2017.8243356</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Sutton</surname>
<given-names>R. S.</given-names>
</name>
<name>
<surname>Barto</surname>
<given-names>A. G.</given-names>
</name>
</person-group> (<year>1998</year>). <source>Reinforcement Learning: An Introduction</source>. <publisher-loc>Cambridge, MA, USA</publisher-loc>: <publisher-name>MIT Press</publisher-name>, <fpage>3</fpage>&#x2013;<lpage>23</lpage>. </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Talaq</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Al-Basri</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>1999</year>). <article-title>Adaptive Fuzzy Gain Scheduling for Load Frequency Control</article-title>. <source>IEEE Trans. Power Syst.</source> <volume>14</volume> (<issue>1</issue>), <fpage>145</fpage>&#x2013;<lpage>150</lpage>. <pub-id pub-id-type="doi">10.1109/59.744505</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Venayagamoorthy</surname>
<given-names>G. K.</given-names>
</name>
<name>
<surname>Sharma</surname>
<given-names>R. K.</given-names>
</name>
<name>
<surname>Gautam</surname>
<given-names>P. K.</given-names>
</name>
<name>
<surname>Ahmadi</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Dynamic Energy Management System for a Smart Microgrid</article-title>. <source>IEEE Trans. Neural Netw. Learn. Syst.</source> <volume>27</volume> (<issue>8</issue>), <fpage>1643</fpage>&#x2013;<lpage>1656</lpage>. <pub-id pub-id-type="doi">10.1109/TNNLS.2016.2514358</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Ming</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Deep Reinforcement Learning Method for Demand Response Management of Interruptible Load</article-title>. <source>IEEE Trans. Smart Grid</source> <volume>11</volume> (<issue>4</issue>), <fpage>3146</fpage>&#x2013;<lpage>3155</lpage>. <pub-id pub-id-type="doi">10.1109/TSG.2020.2967430</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wen</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>O&#x27;Neill</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Maei</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Optimal Demand Response Using Device-Based Reinforcement Learning</article-title>. <source>IEEE Trans. Smart Grid</source> <volume>6</volume> (<issue>5</issue>), <fpage>2312</fpage>&#x2013;<lpage>2324</lpage>. <pub-id pub-id-type="doi">10.1109/TSG.2015.2396993</pub-id> </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xi</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>A Multi-step Unified Reinforcement Learning Method for Automatic Generation Control in Multi-Area Interconnected Power Grid</article-title>. <source>IEEE Trans. Sustain. Energy</source> <volume>12</volume> (<issue>2</issue>), <fpage>1406</fpage>&#x2013;<lpage>1415</lpage>. <pub-id pub-id-type="doi">10.1109/TSTE.2020.3047137</pub-id> </citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yamashita</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Taniguchi</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>1986</year>). <article-title>Optimal Observer Design for Load-Frequency Control</article-title>. <source>Int. J. Electr. Power &#x26; Energy Syst.</source> <volume>8</volume> (<issue>2</issue>), <fpage>93</fpage>&#x2013;<lpage>100</lpage>. <pub-id pub-id-type="doi">10.1016/0142-0615(86)90003-7</pub-id> </citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yan</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Dynamic Optimization Model of AGC Strategy under CPS for Interconnected Power System[J]</article-title>. <source>Int. Rev. Electr. Eng.</source> <volume>7</volume> (<issue>5PT.B</issue>), <fpage>5733</fpage>&#x2013;<lpage>5743</lpage>. </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Chan</surname>
<given-names>K. W.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Stochastic Optimal Relaxed Automatic Generation Control in Non-markov Environment Based on Multi-step Q(&#x3bb;) Learning</article-title>. <source>IEEE Trans. Power Syst.</source> <volume>26</volume> (<issue>3</issue>), <fpage>1272</fpage>&#x2013;<lpage>1282</lpage>. <pub-id pub-id-type="doi">10.1109/TPWRS.2010.2102372</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zeynelgil</surname>
<given-names>H. L.</given-names>
</name>
<name>
<surname>Demiroren</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Sengor</surname>
<given-names>N. S.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>The Application of ANN Technique to Automatic Generation Control for Multi-Area Power System</article-title>. <source>Int. J. Electr. Power &#x26; Energy Syst.</source> <volume>24</volume> (<issue>5</issue>), <fpage>345</fpage>&#x2013;<lpage>354</lpage>. <pub-id pub-id-type="doi">10.1016/s0142-0615(01)00049-7</pub-id> </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>X. S.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Consensus Transfer Q-Learning for Decentralized Generation Command Dispatch Based on Virtual Generation Tribe</article-title>. <source>IEEE Trans. Smart Grid</source> <volume>9</volume> (<issue>3</issue>), <fpage>1</fpage>. <pub-id pub-id-type="doi">10.1109/TSG.2016.2607801</pub-id> </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>X. S.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Pan</surname>
<given-names>Z. N.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Bao</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Lifelong Learning for Complementary Generation Control of Interconnected Power Grids with High-Penetration Renewables and EVs</article-title>. <source>IEEE Trans. Power Syst.</source> <volume>33</volume> (<issue>4</issue>), <fpage>4097</fpage>&#x2013;<lpage>4110</lpage>. <pub-id pub-id-type="doi">10.1109/TPWRS.2017.2767318</pub-id> </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Adaptive Distributed Auction-Based Algorithm for Optimal Mileage Based AGC Dispatch with High Participation of Renewable Energy</article-title>. <source>Int. J. Electr. Power &#x26; Energy Syst.</source> <volume>124</volume>, <fpage>106371</fpage>. <pub-id pub-id-type="doi">10.1016/j.ijepes.2020.106371</pub-id> </citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Optimal Mileage Based AGC Dispatch of a GenCo</article-title>. <source>IEEE Trans. Power Syst.</source> <volume>35</volume> (<issue>4</issue>), <fpage>2516</fpage>&#x2013;<lpage>2526</lpage>. <pub-id pub-id-type="doi">10.1109/TPWRS.2020.2966509</pub-id> </citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>A Random Forest-Assisted Fast Distributed Auction-Based Algorithm for Hierarchical Coordinated Power Control in a Large-Scale PV Power Plant</article-title>. <source>IEEE Trans. Sustain. Energy</source> <volume>12</volume> (<issue>4</issue>), <fpage>2471</fpage>&#x2013;<lpage>2481</lpage>. <pub-id pub-id-type="doi">10.1109/TSTE.2021.3101520</pub-id> </citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Virtual Generation Tribe Based Robust Collaborative Consensus Algorithm for Dynamic Generation Command Dispatch Optimization of Smart Grid</article-title>. <source>Energy</source> <volume>101</volume>, <fpage>34</fpage>&#x2013;<lpage>51</lpage>. <pub-id pub-id-type="doi">10.1016/j.energy.2016.02.009</pub-id> </citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Ye</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Chance Constrained Dynamic Optimisation Method for AGC Units Dispatch Considering Uncertainties of the Offshore Wind Farm[J]</article-title>. <source>J. Eng.</source> <volume>2019</volume> (<issue>16</issue>), <fpage>2112</fpage>. <pub-id pub-id-type="doi">10.1049/joe.2018.8558</pub-id> </citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Lan</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Diao</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>D.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>A Data-Driven Method for Fast AC Optimal Power Flow Solutions via Deep Reinforcement Learning</article-title>. <source>J. Mod. Power Syst. Clean Energy</source> <volume>8</volume> (<issue>6</issue>), <fpage>1128</fpage>&#x2013;<lpage>1139</lpage>. <pub-id pub-id-type="doi">10.35833/MPCE.2020.000522</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>