Gait Optimization Method for Humanoid Robots Based on Parallel Comprehensive Learning Particle Swarm Optimizer Algorithm

To improve the fast and stable walking ability of a humanoid robot, this paper proposes a gait optimization method based on a parallel comprehensive learning particle swarm optimizer (PCLPSO). Firstly, the key parameters affecting the walking gait of the humanoid robot are selected based on the natural zero-moment point trajectory planning method. Secondly, by changing the slave group structure of the PCLPSO algorithm, the gait training task is decomposed, and a parallel distributed multi-robot gait training environment based on RoboCup3D is built to automatically optimize the speed and stability of bipedal robot walking. Finally, a layered learning approach is used to optimize the turning ability of the humanoid robot. The experimental results show that the PCLPSO algorithm achieves a quickly optimal solution, and the humanoid robot optimized possesses a fast and steady gait and flexible steering ability.


INTRODUCTION
Gait planning is a research hotspot for humanoid robots, and it provides some technical support for humanoid robots walking like humans. The methods of gait planning can be broadly divided into three categories: human walking parameter-based methods (Baoping et al., 2015;Hereid et al., 2018), humanoid walking model-based methods (Sato et al., 2010;Winkler et al., 2018), and intelligent algorithm-based methods (Huan and Anh, 2015;Elhosseini et al., 2019). The method based on human walking parameters makes the gait of humanoid robots more similar to the way humans walk, but it costs a lot of time to find suitable gait parameters from human walking data and apply them to humanoid robots. Paparisabet et al. (2019) proposed a similar function for humanlike motion, formulated kinematic constraints for humanoid robots in contact with the ground, and finally proposed humanoid walking with very high similarity to human motion. In Weon and Lee (2018), Weon et al. proposed a method for generating humanoid robot motion based on motion capture data, which corrects extracted joint trajectories based on a reprogrammed zero-moment point (ZMP) trajectory. Researchers have extensively investigated bipedal walking model-based approaches. In Graf and Röfer (2011), the three-dimensional linear inverted pendulum model (3D-LIPM) proposed is one of the most widely used simplified dynamics models for humanoid robots. The 3D-LIPM approximates the humanoid robot in the three-dimensional space to an inverted pendulum model composed of mass points and massless legs connecting the points to the support points and constrains the center of mass to move on the constrained plane. Jadidi and Hashemi (2016) proposed a closed-loop 3D-LIPM gait for RoboCup standard platform and implemented a full range of walking on the NAO robot. The 25 degrees of freedom of the NAO robot make it have excellent omnidirectional walking and full-body motion performance. The RoboCup 3D simulation team uses the NAO robot as a reference model. At the same time, this model is also widely used in robot simulation competitions at all levels at home and abroad. Astudillo et al. (2018) used the 3D-LIPM as a model for robot and ZMP to map joint angles to achieve a humanoid robot walking on a slippery platform.
As the degrees of freedom of humanoid robots increase, the complexity of systems will increase as well. Bipedal walking model-based methods will not be sufficient for the development of humanoid robot control (Fayong et al., 2014). Besides, a variety of intelligent control methods are developed, which do not require accurate modeling (MacAlpine et al., 2015;Hong and Lee, 2016;Bonyadi and Michalewicz, 2017). However, the process of adjusting motion parameters and posture based on various models is very tedious and time-consuming. When the given parameters are not reasonable, walking instability and robots moving at a low speed may occur. This shortcoming can have a significant impact on the coordination between the needs of speed, stability, and flexibility. Therefore, various intelligent algorithms are used for robot gait planning. Many researchers have applied the central pattern generator (CPG) to generate gait trajectories, but the parameter optimization of this method is a challenge (Bai et al., 2019). Zhong et al. (2017) transformed phase signal from CPG output into a trajectory signal for the legs of a six-legged robot by adjusting it. Wang et al. (2019) proposed a gait planning method based on a reactive neuromuscular controller and CPG to achieve a powersaving human-like large walk, and the controller parameters were optimized based on an optimization algorithm in the paper. Common optimization algorithms such as central force optimization (CFO) and genetic algorithm (GA) have also been successfully used for the gait planning of humanoid robots. Kumar et al. (2018) applied GA to optimize parameters for a triple-linked humanoid robot to achieve numerical simulation of energy-controlled stable walking. Huan et al. (2018a) used CFO to optimize the foot lift amplitude of a humanoid robot, which caused an efficient and stable gait. PSO is a common intelligent optimization algorithm that solves global optimization problems simply and efficiently (Kennedy and Eberhart, 1995). Huan et al. (2018b) applied PSO to optimize joint angles to achieve stable walking for a humanoid robot with 10 degrees of freedom. Mandava et al. proposed a multiobjective particle swarm optimization algorithm method for the gait optimization of humanoid robots for the trolley table model. This method uses a sliding mode controller to optimize robust tracking control and realizes the 3D simulation walking of a humanoid robot (Mandava and Vundavilli, 2018). Gülcü and Kodaz (2015) improved the performance of comprehensive learning particle swarm optimizer (CLPSO) through parallel computation and proposed PCLPSO. Most of the studies mentioned earlier are devoted to single movements such as straight, rotating, and going up and down stairs (Faraji et al., 2019). There are relatively few studies that comprehensively consider bipedal robot forward and rotation and their arbitrary motion connection transitions. Also, during the simulation process, the individual simulation platforms always add noise in the same way, resulting in similar movements of the bipedal robots. Parallel optimization algorithms can be a good solution to this problem. Muniz et al. (2016) optimized the keyframe movements of a humanoid robot (getting up, kicking, etc.) through parallelization to improve the motion performance of the robot. However, parallel algorithms suffer from low fault tolerance when dealing with distributed tasks in RoboCup3D, and it is difficult to ensure the correctness and stability of the running process.
Based on the considerations mentioned earlier, this paper selects 13 key parameters that affect speed and stability based on the gait planning method of natural ZMP trajectories and designs two evaluation functions to address the problems of humanoid robot walking. By changing the cluster structure of the PCLPSO algorithm, a parallel distributed multi-robot gait training environment is established by using the RoboCup3D simulation platform. The training environment consists of multiple computers that can operate independently. The nodes use the computer network for information transfer to achieve a common task (gait optimization for humanoid robots). The computational efficiency is improved by running in parallel in a distributed environment. A layered learning approach is used to optimize the evaluation function layer by layer. Experimental results show that the optimized humanoid robot has a faster and more stable straight gait and excellent turning ability and has less wobble when switching between straight and turning.

GAIT PARAMETER SELECTION BASED ON NATURAL ZERO-MOMENT POINT TRAJECTORY PLANNING
In this paper, after setting a natural ZMP trajectory from heel to toe movement based on a 3D-LIPM in the single-leg support phase, a mass-centered trajectory equation is obtained (Graf and Röfer, 2011). In the double-leg support phase, a linear pendulum model was used to generate mass-centered trajectory equations. Equations for multistep planning of masscentered trajectories in a unified coordinate system are also given. After planning the walking trajectory by natural ZMPbased mass-centered trajectory planning method, the key gait parameters are selected and optimized based on the experience of manual tuning.

Multistep Trajectory Planning in a Unified Coordinate System
During the movement of the humanoid robot, if only the front and back and up and down movements are considered and the left and right movements are ignored, it is easy to cause the robot to lose its balance and fall. Therefore, it is necessary to extend the linear inverted pendulum to a threedimensional environment and model the robot with a threedimensional linear inverted pendulum. However, the process of directly analyzing and researching the multi-link structure of biped robots is often cumbersome, so this article equivalently simplified the robot to a reasonable mathematical model, which is convenient for research. Regarding the body of the robot as a mass point and the legs as massless support rods, a threedimensional linear inverted pendulum model can be built. It does not need to know the parameters of the robot, such as the mass and the inertia of each joint, but is derived from the model for easy calculation. According to 3D-LIPM, a humanoid robot is simplified to an inverted pendulum with only a center of mass and a retractable massless pendulum, and the height of mass is assumed to remain constant, as shown in Figure 1A.
Assume that the height of the center of mass is fixed at z s and the acceleration of gravity is g, the equation of motion between ZMP trajectory p sx (t) and the center of mass in the x-axis direction is: To make the ZMP trajectory of humanoid robot walking similar to that of human walking, this paper uses a linear equation to represent the ZMP trajectory, as shown in Figure 1B. Assuming that in the single-leg support phase, ZMP moves in the x-axis direction in the range of [p sx min , p sx max ], the walking period of the single-leg support phase is T s ; the center of the foot is the origin of a coordinate system, and the trajectory of ZMP is given by the following equation: where, t ∈ [0, T s ], b 0 = p sx min , b 1 = (p sx max − p sx min )/T s . By defining w s = z s g by substituting Equation (1) into Equation (2) and solving a differential equation, one has: If the position and velocity x s (0) of the center of mass at the initial moment of the single-leg support phase andẋ s (0) are known, then one has: If the positions of the center of mass at the initial moment of single-leg support phase and the positions of the center of mass at the moment of termination x s (0) and x s (T s ) are known, there are: According to Equations (2-4), the centroid trajectory planning based on natural ZMP can be realized in the single-leg support phase. The direct application of the single-leg support phase method for walking requires the assumption that the support leg switch is instantaneous. This will cause the center of mass acceleration to jump from the maximum to the minimum. To obtain a smooth center-of-mass velocity trajectory, the legs support phase is introduced. In this paper, a linear pendulum model is used to realize natural ZMP trajectory planning of the double-leg supporting phase.
In the double-leg support phase, according to the linear pendulum model, as shown in Figure 1C, the equation of the relationship between the position of robot center of mass and acceleration is given as follows: , T d is double-leg support phase walk period, and D is an x-axis coordinate of the fixed end of the linear pendulum.
With w d = −z d g , from Equation (9) can have: In the double-leg support phase, the starting position and speed and the ending position and the speed of the center of mass can be known, that is, A smooth center-of-mass trajectory based on natural ZMP trajectory can be achieved in the double-leg support phase according to Equations (10-13).
The methods mentioned earlier have their own coordinate systems in the single-leg support phase and double-leg support phase, which do not facilitate multistep planning calculations for footprint planning. To do so, the equations mentioned earlier need to be unified in the same coordinate system. The equation for the position and velocity of the center of the mass in multistep planning can be expressed by the following equation: The robot gait realized by Equations (14, 15) yields a natural ZMP trajectory.

Swing Leg Trajectory Planning
Cosine functions and Bessel curves can all be used to plan swing-leg trajectories (Kajita et al., 2002;Grzelczyk et al., 2016). However, the humanoid robot is divided into two phases of single-leg support phase and double-leg support phase during walking, and only when the speed and acceleration of swingleg start and landing are zero can the humanoid robot maintain stability when switching between single-leg support and doubleleg support (Tang et al., 2003). To obtain a smoother trajectory of the oscillating leg, this paper chooses the method of simple harmonic motion synthesis: where D sw and H sw are the maximum height of step length and leg lift, respectively.

Selection of Gait Parameters
There are two kinds of bipedal walking pattern generation methods. The first method uses precise knowledge of robot dynamics parameters, such as mass, centroid position, and inertia of each joint to configure walking mode. Therefore, the method mainly depends on the accuracy of the model data. In contrast, the second method uses limited knowledge of dynamics, such as the total position of the center of mass, the total angular momentum, etc. The simplified model method (3D-LIMP) used in this paper belongs to the second type. After the optimized trajectory is obtained, the robot can execute according to the corresponding trajectory, and an excellent walking gait can be obtained. By means of a gait generation method based on natural ZMP trajectory planning, humanoid robot walking is summarized in the following algorithmic steps.
Algorithm 1 1.Set the walking cycle T s and walking parameters ( y ); 2.Initialization time T = 0, number of walking units n = 0; 3.Calculate the inverted pendulum equation from time T to T + T s and get the equation of mass center trajectory; 4. T = T + T s , n = n + 1; 5. Calculate and determine the next foothold of the biped robot (p (n) x , p (n) y ); 6. Give the next position of the bipedal robot; 7. Return to step 3.
The RoboCup3D server communicates with the robot once every 20 ms. In this article, the robot is also controlled once in 20 ms. One step is eight times, so the walk period T s is 0.16 s. As algorithm 1 only considers the robot walking in a straight line, for the case of steering walk. If the step length is set to zero, the robot will only walk laterally, and if the step width is set to zero, the robot will only walk in a straight line. However, in a simulation match, the flexibility of the player is enhanced by changing the

Parameter Description
FootLength Step size of one step (D sw )

FootWidth
Step width of one step direction of travel while walking quickly. To be able to walk at any angle, additional information about the direction is required. The direction of each step is specified as s θ , as shown in Figure 2.
Equation (18) represents the position of the nth step foothold of a humanoid robot (p (n) x , p (n) y ) Therefore, only important parameters such as step length, step width, and directional angle of humanoid robot need to be controlled to enable robot to walk. The gait parameters of the biped robot are as many as 40. If each parameter is optimized, it will inevitably affect the convergence speed because of the huge state space. And different gait parameters bias the focus of walking differently. According to natural ZMP-based masscentered trajectory planning method and the previous experience of manual debugging, 13 key parameters affecting the gait of humanoid robot were selected, as shown in Table 1.

GAIT OPTIMIZATION BASED ON PARALLEL MULTIGROUP PARTICLE SWARM ALGORITHM
Aiming at the problem that a lot of manual debugging time is required when planning the robot trajectory by directly using a simplified model, this paper proposes a machine learning method based on the PCLPSO algorithm to optimize gait parameters. First, the important parameters are extracted according to the centroid trajectory planning method based on natural ZMP. Secondly, in the optimization of walking mode, different evaluation functions are set according to the requirements of game gait mode, and the PCLPSO algorithm is used to optimize the omnidirectional walking ability of a humanoid robot.

Parallel Comprehensive Learning Particle Swarm Optimizer Algorithm
PSO algorithm is a fast and efficient optimization algorithm (Kumar et al., 2018). Multiple individuals search for the target in the search area. It is a parallel and random optimization algorithm. Compared with other intelligent algorithms, it has a faster convergence speed and robustness. Each particle in PSO calculates the evaluation value based on the evaluation function.
During the search of each particle, two extreme values are compared: the first is the optimal solution pbest of a particle; the other is the global optimal solution gbest. The speed and position updates of PSO are shown in Equations (19,20): where i = 1, . . . , N is the number of populations and k = 0, . . . , N iter is the number of iterations; pbest k i represents the local optimal solution found by the particle itself, whereas gbest k represents the global optimal solution for all current particles; c 1 and c 2 are two constants greater than zero, which are used to adjust the degree of attraction of local and global optimal to the particle; r 1 and r 2 are both uniformly distributed random numbers in the interval [0,1], which affects the random nature of the algorithm. CLPSO is a valid variant of PSO (Liang et al., 2006). The main difference between CLPSO and PSO is that the original PSO requires the use of pbest and gbest, whereas for CLPSO, updating the location only requires pbest. For PSO, pbest only needs its own pbest in CLPSO update, which can come from other individuals. The speed update formula is shown as follows: where c is the acceleration factor. r is also uniformly distributed random numbers in the interval [0,1]. f i = [f i (1) , f i (2) , · · · , f i (D)] is determined by the probability of population P c of randomly selected particles i. pbest k represents the pbest value of particle, which is stored in the list f i of the particle i of the dth dimension. P c is calculated, as shown in Equation (22); 0.05 and 0.45 are the optimal values of hyperparameters set by Liang et al. (2006) based on experience.
P c = 0.05 + 0.45 * e ( 10(i−1) ps−1 ) where ps is the population size. In PCLPSO, the particle population is divided into multiple groups, including a master group and several slave groups. All clusters run PCLPSO in the cluster environment at the same time. The gbest, lbest, and pbest in PCLPSO are defined as follows: gbest is the global optimal solution for all groups, lbest is the local optimal solution for populations, and pbest is the optimal solution for particles. Each slave group sends its lbest to the master group. The master group chooses the best solution from all lbest as gbest and sends gbest to all slaves. Each slave group randomly selects a particle to receive gbest to update its own pbest. PCLPSO algorithm updates its own pbest by a parallel distributed collaborative strategies. They are improving the quality of the solution and the rate of solving. Each particle in the PSO algorithm is updated by updating pbest and gbest values. The gbest affects the direction of population, and when it falls into a local minimum, the swarm particles tend to fall into this local minimum. PCLPSO adopts a comprehensive learning strategy, and the speed and position of the updated particles depend on all other particles. The lbest is selected from the pbest of the slave group, and the main group selects the optimal solution gbest from all the lbest of the slave group to ensure that it will not fall into a local minimum.

Construction of a Parallel Distributed Training Environment
The particles in the group in the PCLPSO algorithm are all running on a single computer, but each football robot in a RoboCup3D match is controlled by a separate client. Therefore, the PCLPSO algorithm cannot be directly applied to the RoboCup3D simulation platform. Because all clients are connected to the RoboCup3D simulated football server, the client and server can run on multiple computers in a distributed environment. Therefore, the PCLPSO algorithm is decomposed into three algorithms by changing the cluster structure to accommodate the RoboCup3D simulation platform in this paper.
In the new cluster, the structure is still based on the masterslave model of parallel computing; only one is a master cluster; the others are slaves, as shown in Figure 3. The gait training task of the humanoid robot is decomposed into multiple processes in the slave cluster, and a distributed training system is built to run on multiple computers. The parallel and distributed optimization framework can reduce the scale of solving gait problems. When all slaves are started and connected to the master group, and the master group sends parameters to the slave group, the communication cycle begins. Communication takes place only between the master group and slave groups; there is no communication between slave groups. In the cluster structure of this paper, the master node has no group, and its main function is to send initial parameters to the group of slave nodes (Algorithm 2, line 6), and the slave group sends its own lbest to the master group (Algorithm 2, line 8). The master group collects the received lbest, including its own lbest, into a pool called Elite Pool (EP). The master group finds gbest from the EP and eventually sends gbest to the slave group (Algorithm 2, lines 9, 10, and 11). The detailed algorithm of the master is shown in Algorithm 2. The slave group exchange algorithm is shown in Algorithm 3. Add an exchange program to slave group clients for exchanging data between the master client and all the clients of the slave group. In this paper, Local Pool (LP) is added to the slave group to collect all pbest in the group. The slave group finds lbest among all pbest and sends its own lbest to the master group. The master server adds them to EP, finds gbest, and shares it with the slave server. A robot is a member of a group during gait optimization training, a slave group has seven clients, and a client controls a humanoid robot, and eventually, multiple humanoid robots are trained in a RoboCup3D simulation court. The group client is used to calculate ZMP trajectory, COM trajectory, and swing leg trajectory when a humanoid robot is trained by the PCLPSO algorithm to walk. This part also calculates joint angle information and adaptation values for a cycle of humanoid robot walking. The detailed algorithm from the group client is shown in Algorithm 4. The gait optimization process for the client-based PCLPSO algorithm is shown in Figure 4.
Algorithm 2 Master algorithm 1. int n, k, P, m 2. double w, c1, c2 3. Array EP 4. Initialize the parameters n, k, P, m, w, c1, c2 5. Wait until all slave swarms have been launched and connected to this Master 6. The master sends the parameters to the slaves 7. repeat 8. Wait until all slave swarms have sent lBest to the master 9. The master stores the lBests into the EP 10. The master finds the gBest in the EP and empties the EP 11. The master sends the gBest to the slaves 16. until the stopping criterion is met 17. return Algorithm 3 Switching algorithm of a slave swarm 1. int n, k, P, m 2. double w, c1, c2 3. Array LP 4. Connect to the Master until the parameters are received from the master 5. int d = 0 6. repeat 7. d++ 8. Store the pBests into the LP until all pBests in this slave swarm are received 9. if d mod P==0 then 10. Find the lBest in the LP 11. Send the lBest to the master 12. Wait until the gBest is received from the master 13. Randomly update a lBest with the gBest in the LP 14. end if 15. Send the LP to the all clients in this swarm 16. until the stopping criterion is met 17. return Frontiers in Neurorobotics | www.frontiersin.org  Frontiers in Neurorobotics | www.frontiersin.org 1. int n, k, P, m 2. double w, c1, c2 3. Array LP 4. Connect to the swap program in a swarm 5. Wait until the parameters are received from swap program in a swarm 6. Initialize the parameters′ and velocity 7. Calculate Swing leg Trajectory and all joint angles of Robot NAO in a walk cycle 8. Humanoid robot performs a walking training and calculates the fitness value 9. Update pBest 10. Send the pBest to the swap program until the LP is received from the swap program 11. Update the LP and find the lBest in the LP 12. repeat 13. Update the velocity and position 14. Calculate Swing leg Trajectory and all joint angles of Robot NAO in a walk cycle 15. Humanoid robot performs a walking training and calculates the fitness value 16. Update pBest 17. Send the pBest to the swap program until the LP is received from the swap program 18. Update the Local Pool and find the lBest in the LP 19. until the stopping criterion is met 20. return

Design of Evaluation Functions
The design of evaluation criteria is critical to achieving an excellent result (MacAlpine and Stone, 2018). In this paper, gait optimization results are judged based on the following criteria: 1. The distance the humanoid robot travels per unit time.
where ϕ start and ϕ end are the coordinates of start and end of the training, respectively, and a 1 is the weight. 2. Whether zero force matrix is always within the supported polygon and robot walks without falling all the time. in this paper, the zmp coordinates of the walking action sequence are chosen as the basis for calculation.
where P x k and P y k are the ZMP coordinates for each gait action sequence, and a 2 is the weight. 3. Humanoid robot in the fast walking process torso always maintains stability; body torso does not shake. In a simulation match, the body of players is in frequent contact with each other, and it is very easy to be knocked down by the opponent if the torso is not stable. At the initial time of double-support phase, the expected com coordinates are 2 ft on the center. In this paper, torso sway is detected by comparing the two coordinates of com and the center of feet during the initial stage of the dual support phase.
f shake = 0 f abs x f < th θ c otherwise (26) where x foot R and x foot L are the coordinates of initial time 2 ft of the double support phase, th θ is the set threshold, and f shake and c are penalty and normal numbers, respectively. If the robot falls during training, a constant f falling will be given as a penalty value. The evaluation function for speed and stability training is shown in Equation (27).
Add the lateral walking and the turning angles of the humanoid robot to the above, and the maximum distance of lateral walking for each step is limited to 0.04 m, and the maximum turning angle is 15 • . In the optimization process, two different target points are set for the bipedal walking robot to train its gait. If the target point is reached, then the training of the next target point is quickly stopped. The evaluation function is as follows: where f punish is the penalty for not completing the task within the specified time, and f reward is the reward for completing the task.

Layered Learning
In the RoboCup3D simulation game, players can be roughly divided into two categories. The first category is to hold the ball, avoid the defense of the opponent, intercept the ball in the shortest time, and send the ball to the goal of the opponent; the second category is to run according to the situation on the field players and walk to the designated location as soon as possible. Players without the ball often complete tasks such as running and intercepting opposing players with the ball up and down. For players to complete the tasks assigned by their superiors in the shortest possible time, the walking speed of robots is very important. The player is the defensive object of an opponent after getting the ball, so frequent collisions and turns are inevitable. Improve the stability of the robot and steering ability to avoid falling and taking advantage of collisions. The omnidirectional walking of a biped robot can be decomposed into forward walking and steering motion. This paper uses a layered learning method to optimize each decomposition action of omnidirectional walking, and the final omnidirectional walking optimization is shown in Figure 5. Each sub-module requires the optimization algorithm to be trained through the corresponding evaluation function. Use manually adjusted parameters to drive the robot to walk, and then learn to get a fast mode with speed and stability as the goal. The final optimized parameters of the fast and stable mode are used as the initial state of the steering mode, and the learning is continued with the goal of steering stability. Through hierarchical learning of the humanoid robot, two different walking parameter sets (straight and turning) can be obtained. When different walking tasks are to be completed, the parameter sets can be switched at any time, and the flexible connection transition during switching is also obtained through learning.

EXPERIMENTAL RESULTS AND ANALYSIS
This article uses a 3D-LIMP model to generate gait and opens the function of adjusting walking engine parameters to the public. OpenAI Gym acts as a bridge between optimization algorithm and environment and accepts the gait parameters computed by an optimization algorithm, performs a training session in the environment, and then provides the necessary information back to the optimization algorithm. To verify the effectiveness of the PCLPSO algorithm for gait optimization, this paper compares PCLPSO with three commonly used optimization algorithms, GA, CFO, and CMA-ES (Rongyi and Chunguang, 2018), in terms of the speed, stability, and turn capability of gait. The angle of view of players is only−120-120 • . To get accurate data during training, the value of setViewCones in the server is changed from 120 to 360 • . It is also necessary to change the game mode to fast mode during training so that the training is not limited by time. The mathematical properties of the four optimization algorithms of GA, CFO, CMA-ES, and PCLPSO are evolutionary algorithms. Each training uses the same population size (10) and the same number of variables (13). First, randomly produce 10 sets of parameters. The humanoid robot walks according to the 10 sets of parameters. The walking speed and stability will be different. Choose the optimal one, and then iterate the formula. Generate the next generation, and repeat the execution with 10 iterations. In this article, the coding method of the GA algorithm is standard binary coding. The number of parameters determines the length of a chromosome. When calculating the evaluation function value, the binary chromosome string should be decomposed and decoded to get the real number parameters. The specific parameters of the GA algorithm are as follows: take population size N p = 40, evolutionary algebra T = 200, and crossover probability P c = 0.7, and variation probability P m = 0.05. The specific parameters of the CFO algorithm are as follows: α = 0.3,β = 0.3,γ start = 0, γ stop = 1, and γ = 0.1. Firstly, Equation (27) is chosen as an evaluation function to optimize the speed and stability of the robot. The relationship between the number of clients in a single slave group and training speed is shown in Figure 6. The training speed of multiple clients is compared to a single client when it takes a single client to train to an optimal value 1. The training speed increases linearly between 1 and 7 clients. The training speed reaches its maximum when the slave group contains seven clients. So, in the following comparison experiment, the slave group structures of PCLPSO are trained by seven clients controlling seven humanoid robots. The training scenario of a humanoid robot in the RoboCup3D scenario is shown in Figure 7. Figure 8A shows the current optimal fitness values of four algorithms, all of which increase with the increase in the number of training. The evaluation function of the PCLPSO algorithm reaches the optimal value after 6,500 trainings, and the optimal value is 5.86. The evaluation function of the CMA-ES algorithm reaches the optimal after 14,000 trainings, and the optimal value is 4.73. GA algorithm reaches the optimal evaluation function after 17,500 trainings, and the optimal value is 3.92. The evaluation function of the CFO algorithm reaches the optimal after 16,000 trainings, and the optimal value is 4.2. The evaluation function value is averaged every 50 times of training, as shown in Figure 8B. As the number of iterations increases,  the evaluation function of all four algorithms increases steadily. One hundred twelve iterations for PCLPSO, 146 iterations for CMA-ES, 176 iterations for GA, and 148 iterations for CFO are stable. When the parallel distributed optimization framework interacts with RoboCup3D, many low-cost data samples can be obtained. Decomposition of gait problems can reduce the scale of problem-solving. Make sure to get a better solution faster.
The walking stability measured by Equation (25) shows that the x f fluctuation range of the PCLPSO algorithm is−0.2 to 0.2 mm. The CMA-ES algorithm fluctuates between -0.3 and 0.5. The x f range of GA is−0.7 to 1.4 mm, and that of the CFO algorithm is−1.3 to 0.8 mm. The PCLPSO algorithm has the smallest x f fluctuation range, and the humanoid robot is more stable. The optimized trajectories of swing leg x-axis and z-axis of four algorithms are shown in Figures 9A,B. At the moment of takeoff and landing, the optimized trajectory of the swing leg of the PCLPSO algorithm is parallel to the ground, which ensures stability when switching between the single-support phase and double-support phase.
The real-time changes of hip deflection pitch joint angle, hip transverse roll joint angle, and hip pitch joint angle for four algorithms are shown in Figure 10. The hip angle changes of the PCLPSO algorithm are very stable and better than those of the CMA-ES, CFO, and GA algorithms.
A comparison of changes in the center of mass landing points for four methods is shown in Figure 11A. The landing point of the center of mass in the double-leg support phase of the robot using the parameters optimized by the PCLPSO algorithm is always on the center of two-legged linkage and remains stable, whereas the landing point of the robot center of mass using the parameters optimized by CFO and GA is unstable. CMA-ES algorithm optimized humanoid robot mass center fallout is more stable than the CFO and GA algorithms but inferior to the PCLPSO algorithm. ZMP trajectory after the optimization of three algorithms is shown in Figure 11B. The ZMP trajectory of the CMA-ES, GA, and CFO algorithms is close to the edge of the support polygon, whereas the whole trajectory curve of the PCLPSO algorithm moves toward the middle of the support polygon, in which case the stability margin of humanoid robot ZMP point is larger.
After the training mentioned earlier, it can be seen in humanoid robot gait optimization, and PCLPSO algorithm optimization of humanoid robot already has a fast and stable gait, but in simulation competition in a humanoid robot, the target point is constantly changing. When changing from a linear to a rotating state, the average speed decreases again when the rotation angle is small, and the humanoid robot is extremely unstable during rapid stops. In this paper, to better adapt to dynamic walking, layer learning is used to further optimize gait. The robot not only has a fast and stable gait but also has excellent steering ability. Next, the steering ability of the humanoid robot is optimized using Equation (28) as an evaluation function. Test experiments on the turning ability of humanoid robots were conducted under the SimSpark platform of RoboCup3D, and the  walking path was recorded by Matlab using RoboViz observation. As shown in Figure 12A, the body rotation of four algorithms when the robot is optimized to make it turn continuously. The body rotation angle of robots using the PCLPSO algorithm with optimized parameters reaches a maximum of−1.19 • during support leg switching, which shows that the robot is very stable during the switching process of walking and turning. Preplanning the walking path of the robot, the trajectory after layer learning using three algorithms is shown in Figure 12B. The GA algorithm walks a trajectory that cannot be flexible enough to maintain stability when turning. The humanoid robot optimized with the CMA-ES and CFO algorithms can turn flexibly but requires some adjustment time when turning, and the PCLPSO algorithm walks on a smooth trajectory that is almost identical to the planned path.
Let humanoid robot go straight for 15 m, test each algorithm 100 times, and take the average value. The results are shown in Table 2. The PCLPSO algorithm is used to walk with less time and faster speed under the same walking distance. The steering angle represents the angle between the body orientation of the initial position of the humanoid robot and the target point. The four algorithms are tested 100 times when the steering angle is

CONCLUSION
This paper has built a RoboCup3D parallel distributed multirobot gait training environment based on PCLPSO. A layer learning approach was used to optimize the existing problems in layers. It effectively reduces the influence of similar noise of a single simulation platform and has a faster optimization speed than common optimization algorithms. The final experimental results show that the PCLPSO algorithm optimizes faster and walks more quickly and steadily. During turning motion, the PCLPSO algorithm walks with a smoother trajectory and a smaller body rotation angle when switching between straight motion and turning motion, enabling stable, and flexible turning of a humanoid robot. This algorithm can also be extended to other aspects of RoboCup3D, such as the optimization of basic movements of humanoid robots such as goal shooting.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary materials, further inquiries can be directed to the corresponding author/s.