Control Method for PEMFC Using Improved Deep Deterministic Policy Gradient Algorithm

A data-driven PEMFC output voltage control method is proposed. Moreover, an Improved deep deterministic policy gradient algorithm is proposed for this method. The algorithm introduces three techniques: Clipped multiple Q-learning, policy delay update, and policy smoothing to improve the robustness of the control policy. In this algorithm, the hydrogen controller is treated as an agent, which is pre-trained to fully interact with the environment and obtain the optimal control policy. The effectiveness of the proposed algorithm is demonstrated experimentally.


INTRODUCTION
Fuel Cell is the fourth type of power generation technology after hydroelectric, thermal and nuclear power generation. It converts chemical energy stored in fuel and oxidizer directly into electricity through electrode reactions in an isothermal environment (Yang et al., 2021a;Yang et al., 2021b). As a new type of chemical power source, the fuel cell generation process is not a direct combustion of fuel compared to thermal power generation, the power generation efficiency is not limited by the Carnot cycle and the emission of harmful substances is extremely low (Yang et al., 2020;Yang et al., 2018). Its energy conversion rate is as high as 80 %, and its actual efficiency is double that of an ordinary internal combustion engine (Bougrine et al., 2013) The fuel cell is therefore a new power source with high efficiency and clean features, combining new technologies in energy, chemicals, materials and automatic control (Yang et al., 2019a;Yang et al., 2021c).
However, as the PEMFC system is a complex system with multiple inputs and outputs, nonlinear, approximately east, with random disturbances, time-varying and high order (Yang et al., 2019b;Li and Yu, 2021a), it is difficult to achieve satisfactory control results with traditional PID control (Li et al., 2021). In order to obtain accurate and fast response results, various advanced control strategies have been applied in the research of PEMFC output control strategies. (Zhang et al., 2019;Li and Yu, 2021b;Zhang et al., 2021).
2) Sliding mode control. Some higher order sliding mode control methods have also been applied to PEMFC, (Ou et al., 2015) such as first order sliding mode control, higher order super twisted sliding mode control and (Chen et al., 2018) higher order sliding mode control with an observer. 3) PID and its improvement algorithms. Some improved algorithms on the PID algorithm have also been used extensively, for example, neural PID controller (Zhao et al., 2020), fuzzy PID control (Sun et al., 2018), and algorithm combining PID and fuzzy controller (Ou et al., 2017), feedback linearization controller, and reference fractional order PID (FOPID) controller. 4) Adaptive control. Some adaptive control has also been applied to PEMFC control, such as data-driven adaptive controller, an adaptive control based on parameter identification, and adaptive pole search controller. 5) Compound control. There are also some compound controllers, for example, PID-neural network control, interval type II fuzzy (Fuzzy)-PID control, fuzzy adaptive PID control.
The existing research problems are: 1) There is no model -free control algorithm that can effectively adapt to the non-linear characteristics of PEMFC. 2) No optimal algorithm with adaptive capabilities and low computational effort.
For this reason, model-free controllers with strong adaptive capabilities are more suitable for such systems.
A data-driven PEMFC output voltage control method is proposed. Moreover, an Improved deep deterministic policy gradient algorithm is proposed for this method. The algorithm introduces three techniques: double Q learning, policy delay update, and policy smoothing to improve the robustness of the control policy. In this algorithm, the hydrogen controller is treated as an agent, which is pre-trained to fully interact with the environment and obtain the optimal control policy. The effectiveness of the proposed algorithm is demonstrated experimentally.
The innovations in this paper are.
1) A data-driven PEMFC output voltage control method is proposed. 2) An Improved deep deterministic policy gradient algorithm is proposed.
The remainder of this paper comprises the following sections: the PEMFC model is demonstrated in PEMFC Model, and the proposed algorithm is described in Proposed Method; the experimental results are analysed and discussed in Case Studies, and the findings in this paper are summarised in Conclusion.

PEMFC Output Voltage
The dynamic model of the PEMFC has been refined from the electrochemical model. Ideally, the voltage released at full reaction is 1.229 V. The actual potential decreases due to irreversible losses, which in practice are also known as polarization overvoltage. In the power generation process of a PEMFC, polarization overvoltage is mainly manifested as activation overvoltage, ohmic overvoltage and concentration overvoltage. Therefore, in the actual power generation process, the individual voltage is inevitably less than the ideal standard electric potential due to the polarization overvoltage. In addition to the factors such as temperature, pressure and current density, chemical and material factors such as electrode material and electrolyte can also influence the polarization or overvoltage of the electrodes.
For a fuel cell stack consisting of N single cells connected in series, the output voltage V can be expressed as

Ohmic Voltage Overvoltage
The ohmic polarization overvoltage is mainly caused by the equivalent membrane impedance of the proton exchange membrane to the transfer of protons and the impedance of the electrodes and current collectors to the transfer of electrons. Based on the Amphlett model, the PEMFC ohmic overvoltage mainly includes the voltage drop caused by the impedance of the two parts of the PEMFC. These two parts of impedance, one part is the equivalent membrane impedance of the proton membrane, The other part is the resistance that prevents protons from passing through the proton membrane which is usually a constant. According to the resistivity theorem, the equivalent membrane impedance R m can be obtained by the following formula: In the formula, ρ (M) is the resistivity of the proton membrane to the electron flow Ω · cm , B is the thickness of the proton exchange membrane. ρ M can be obtained by the following formula.
Empirically, the internal resistance of the battery is Ohmic polarization overvoltage can be expressed as:

Activation Overvoltage
Activation overvoltage is the deviation of an electrode's potential from its equilibrium potential due to a delay in its electrochemical reaction. The activation polarization overvoltage of the cathode can be obtained Frontiers in Energy Research | www.frontiersin.org September 2021 | Volume 9 | Article 753064 Anode activation polarization overvoltage: The total activation polarization overvoltage is the sum of the anode activation overvoltage and the cathode activation overvoltage, expressed as: c O 2 is the concentration of dissolved oxygen at the cathode catalyst interface which is calculated as follows: c(O 2 ) P O2 /5.08 × 10 6 exp(−498/T) (10)

Thermodynamic Electric Potential
According to the empirical formula for the PEMFC, the thermodynamic electric potential can be obtained as follows: PEMFC can directly convert chemical energy into electrical energy. The chemical energy release of a fuel cell can be calculated by the change of the Gibbs self-burning energy Δg (f) . The Gibbs self-burning energy is usually used to calculate the externally available energy, for basic chemical reaction formula of hydrogen/oxygen reaction PEMFC is: The corresponding change in Gibbs' self-reliance is: Δg f g f, products − g f, reactan is g f H2O − g f H2 − g f O2 (13) The changed Gibbs self-burning energy is a function of temperature and pressure: We can deduce the voltage of the fuel cell:

Dense Differential Polarization Overvoltage
Concentration overvoltage is a phenomenon caused by the deviation of the electrode potential from the equilibrium potential due to the difference between the concentration of ions in the solution at the electrode interface layer and the concentration of the body solution in the electrolytic bath, which can be expressed as:

Dynamic and Capacitive Characteristics of the Double Layer Charge
The phenomenon of a "double layer of charge" in a proton exchange membrane fuel cell is particularly important for the dynamics of the PEMFC. On the surface of the electrode electrons are collected and on the surface of the electrolyte hydrogen ions are collected. Between them there is a potential difference in which charge and energy are stored, which acts as an equivalent capacitance. This "smoothest out" the voltage loss across the equivalent resistance and results in a very realistic dynamic model of the PEMFC. Therefore, when modelling the dynamics of the PEMFC, a capacitance is added to the electrochemical model. This "equivalent capacitance" is able to better represent this effect by smoothing the output voltage response of the fuel cell as the current changes, with a transition time.
In Figure 1, the polarization voltage across Rd isV d , given by the differential equation for the voltage change of a single cell as Thus, the voltage of the stack can be expressed as: The output power and efficiency of the stack can therefore be expressed as: PROPOSED METHOD

DDPG
The DDPG method fuses deep neural networks with Deterministic Policy Gradients (DPG) algorithms and uses actor -critic a framework as the basic architecture for the algorithm. The actor network is used to update the policy and the critic network is used to approximate the state action value function. The use a non-linear neural network as an approximator. Inspired by the algorithm, DQN solve this problem by setting up a target actor network and a target critic network, as well as an experience replay mechanism. Instead of DQN directly copying the current network to the target network, DDPG updates the target network in a "soft" way, ensuring that each parameter update is small, thus achieving a stable training effect.
In DDPG, the objective function is defined as a sum with discounted rewards J(θ μ ) E θ μ r 1 + cr 2 + cr 3 + / (23) Of which The actor network parameters are updated by means of a chain derivative rule for the objective function: To address the problem of under-exploration caused by actors mapping states to deterministic actions in the DPG approach, the DDPG algorithm generates temporal correlated noise through the Ornstein-Uhlenbeck (OU) process to improve the exploration capability of the algorithm under deterministic strategies. DDPG uses an empirical replay mechanism based on random sampling but suffers from Q-value overestimation.

CASE STUDIES
The DDPG control strategy, fuzzy PID controller (Fuzzy-PID), PSO optimized fuzzy PID controller (PSO-PID), and PID are introduced in this paper as comparative examples. The load variation makes the step disturbance at 1 s. The load current magnitude appears from 100 A with load disturbance and rises to 127 A. The results are shown in Figures 1A,B.
1 According to Figures 1A,B-, the IDDPG algorithm improves the robustness of the algorithm because it uses advanced techniques to solve the Q overestimation problem in conventional deep reinforcement learning algorithms. In contrast, the DDPG algorithm does not use an effective strategy to improve the robustness of the algorithm, so the algorithm tends to fall into local optima, making the final control strategy suboptimal and not robust. In addition, the other algorithms do not have better optimal control capability and have difficulty in adapting to the non-linear characteristics of the PEMFC, therefore, their output voltage control performance is low. 2 For Fuzzy-based algorithms, their performance is mostly better than Optimized-based algorithms due to their ability to automatically adjust coefficients, but the simplicity of the Fuzzy rule makes them less accurate.
Optimized-based algorithms are not adaptive and robust due to the inability to adjust the coefficients in real time, which ultimately leads to overshooting and instability of the output voltage.
In summary: In Case 1, the IDDPG algorithm has better static and dynamic performance and is able to control the output voltage effectively.

CONCLUSION
A data-driven PEMFC output voltage control method is proposed. An improved deep deterministic policy gradient algorithm is proposed for this method, which introduces three techniques: Clipped multiple Q-learning, policy delay update and policy smoothing to improve the robustness of the control policy. In this algorithm, the hydrogen controller is treated as an agent, which is pre-trained to fully interact with the environment and obtain the optimal control policy. The effectiveness of the proposed algorithm is experimentally demonstrated.
The IDDPG algorithm has a short response time, a fast response time, good dynamic and static performance indicators, enabling timely and effective output voltage control.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.