A neural flexible PID controller for task-space control of robotic manipulators

This paper proposes an adaptive robust Jacobian-based controller for task-space position-tracking control of robotic manipulators. Structure of the controller is built up on a traditional Proportional-Integral-Derivative (PID) framework. An additional neural control signal is next synthesized under a non-linear learning law to compensate for internal and external disturbances in the robot dynamics. To provide the strong robustness of such the controller, a new gain learning feature is then integrated to automatically adjust the PID gains for various working conditions. Stability of the closed-loop system is guaranteed by Lyapunov constraints. Effectiveness of the proposed controller is carefully verified by intensive simulation results.


Introduction
Today, the great development of science and technology has created a premise for scientific research to develop to a new level in which the field of robotics has being chosen to be the leading industry by many countries. To promote science and technology backgrounds, intelligent robots in the industrial application are starting to prosper strongly, attracting many research experts. To control robot moving safely to desired positions with obstacles, collision avoidance and path planning were matters of concern. In recent years, various strategies have been studied for collision avoidance control purpose. The basic idea behind the collision avoidance algorithms is to design a proper controller which can result in a conflict-free trajectory. Path selection methods are the one of several techniques to avoid obstacles. It uses off-line/on-line algorithms to produce a curve that connects the starting and target points with a predefined initial position, velocity and acceleration. For example, an online trajectory generation algorithm called Ruckig considered third-order constraints (for velocity, acceleration, and jerk), so the complete kinematic state could be specified for waypoint-based trajectories (Berscheid and Kroeger, 2021). The smooth trajectory based on method combining of fourth and fifth order polynomial functions was presented in (Boscario et al., 2012) in which, the outcome of the method was the optimal time distribution of the via points, with respect to predefined objective function. After that, the joint based controller might use the inverse kinematic to solve the desired joint angular. Early collision avoidance approaches concentrated on the static obstacles handling by the sensor-based motion planning methods (Borenstein and Koren, 1991), using nearness diagram navigation to successfully navigate in troublesome scenarios (Minguez and Montano, 2004) and using trajectory planning algorithms to avoid obstacles (Shiller, 2015). In reality, many techniques have been proposed to cope with moving obstacles. For instance, a reactive avoidance method incorporating with a non-linear differential geometric guidance was presented in (Mujumdar and Padhi, 2011) and a collision avoidance algorithm based on the potential fields was proposed in (Huang et al., 2019). It can be seen that in normal applications of robotic manipulators, the controllers were designed in the joint space in which it requires exact inverse kinematic computation as well. Nonetheless, complex internal dynamics and external disturbances coming from divergent working conditions are main obstacles hindering development of excellent controllers.
To realize control objectives of the robots in real-life missions, simple proportional-integral-derivative (PID) controllers are priority options (Bledt et al., 2018), (Wensing et al., 2017) due to simple design. If the proper control gains were found, the high control outcomes could be obtained (Park et al., 2015), (Ba and Bae, 2020). A lot of research have been then studied to improve the performance of the PID controllers using intelligent approaches such as evolutionary optimization and fuzzy logic (Astrom and Hagglund, 1995). The methods exhibited promising control results thanks to using both online and offline sections (Tan et al., 2004). The off-line control one could flexibly select the proper PID parameters based on the system overshoot, settling time and steady-state error, while the on-line one would adopt the operating control errors to adjust fuzzy logic parameters to re-optimize the system, improving the system quality significantly. However, the tuning methodology of fuzzy logic controllers is mostly based on experiences of operators (Juang and Chang, 2011). Another series of the intelligent control category was based on the biological properties of animals in which a genetic algorithm was combined with a bacterial foraging method to simulate natural optimization processes such as hybridization, reproduction, mutation, natural selection, etc., (Cucientes et al., 2007). This evolution could deliver the most optimal solution. That the solving process requires a large number of samples and takes a long-running time limits its application. Recently, tuning PID control parameters using neural networks has become an effective approach with many contributions (Kim and Cho, 2006), (Neath et al., 2014). The conventional PID one itself is a robust controller (Thanh and Ahn, 2006). The learning ability integrated to the controllers makes it flexible to the working environment (Ye, 2008). Lack of an intensive consideration of learning rules in steady-state time could make the system unstable in a long time used (Ba et al., 2019), (Ye, 2008), (Rocco, 1996).
To further improve the control performance, internal and external dynamics of robots need to be compensated during working processes. To this end, classical methods could be employed based on accurate mathematical models of the robots (Craig, 2018), (Zhu, 2010). Good control results were exhibited using such the conventional approaches, but it is not easy to extend the control outcome to complicated robot structures. Intelligent modeling methods could be adopted to increase applicability of the controllers to various robots in different working environments (Karayiannidis et al., 2016), (Gao et al., 2022). Excellent control performances were accomplished with the intelligent control approaches. However, convergence of the learning process is still not explicitly proven (He et al., 2020), (Wang et al., 2020). To support this kind of theoretical drawback, linear leakage functions were integrated the estimation phases of the network operation. However, this term could be slowdown the overall learning performance. Hence, advanced learning behaviors for the network need to be extensionally studied.
In this paper, an intelligent direct PID controller is proposed for position-tracking control in task space of robotic manipulators. Without using inverse kinematics, the operator just needs to input the desired position value, the controller will calculate and give the desired control position to the robot by itself (Craig, 2005;Ba and Bae, 2021;. This process will be of great help since, in practice, there are quite few robots with quite complex hardware structures that make the inverse kinematics calculation difficult. The more degrees of freedom a robot has, the more difficult the calculation process, requiring more time and effort. The proposed controller is built based on a conventional PID framework. A non-linear neural network is then employed to eliminate internal/external disturbances during the working process. To increase the adaptive robustness of the controller, a new gain learning rule is integrated to flexible tune the PID gain for different working conditions.
Outline of the paper is structured as follows. Section 2 discusses system modeling and problem statements. Section 3 presents design of the proposed controller. Section 4 analyzes verification results. The paper is then concluded in Section 5.

System modelling and problem statements
Behaviors of a general robotic manipulator can be presented in the following form (Craig, 2018), (He et al., 2020): where q, _ q, € q are respectively vectors of joint position, velocity, and acceleration, M(q) is the mass matrix, C(q, _ q) is the centrifugal-Coriolis moment, G(q) is the gravitational Frontiers in Robotics and AI frontiersin.org moment, τ f is the frictional moment, τ d stands for external disturbances, and τ is the actuator moment or control signals.
Remark 1: the control objective of this paper is to find out a proper control signal (τ) to control position of the endeffector of the robot following a desired profile. To accomplish this task, we can use inverse kinematics (IK) to compute desired joint positions from the end-effector reference signals. However, it is not trivial to find solutions of complicated robots. To avoid this shortcoming, we can apply direct control algorithms without caring of the IK problem. Hence, one needs consider dynamic model (1) in the task space as follows (Craig, 2018): where x is the end-effector position of the robot, J(q) is the Jacobian matrix, and M(q) is the nominal value of the mass matrix M(q), and d is the lumped disturbance as presented as follows: where Remark 2: It is very difficult to determine accurate parameters of model (1), (2) or (3). Furthermore, the parameters sometimes vary during the working processes. To treat this drawback, the proposed controller is required to be model-free, robust and flexible.

Neural flexible PID controller
In this section, the proposed controller is designed with new features to realize the control mission stated. Theoretical effectiveness of the closed-loop system is then analyzed using Lyapunov constraints.

A flexible PID control framework
The controller is developed based on a conventional PID (Tan et al., 2004) structure as in Eq. 4.
where e x − x d is the control objective, x d is the desired trajectory, J + is pseudo-inverse of the Jacobian J and K p , K d , K i are control gains. Frontiers in Robotics and AI frontiersin.org We assume that the desired trajectory x d is inside of the workspace of the robot and the end-effector x of the robot can reach to the desired position selected. Advanced pathplanning and obstacle-avoidance algorithms (Mujumdar and Padhi, 2011;Shiller, 2015;Huang et al., 2019) could be employed to generate appropriate desired profiles for the robot.
In real-time control, one can tune the control gains (K p , K d , K i ) for acceptable control performances. However, the fixed gains might not ensure good control errors for various working conditions (Thanh and Ahn, 2006), (Rocco, 1996). To cope with this problem, we propose an automatic tuning law for PID gains, as follows: where K 1 , K 2 are positive core gains, α 0 , β 0 are learning rates and k 0 diag( k 0 ) is the activation gain. Frontiers in Robotics and AI frontiersin.org Remark 3: As seen in Eq. 5, the PID gains are structured from static and dynamic gains which respectively yield robustness and adaptation of the closed-loop system. The control gains are varied in non-linear manners to drive the control error to go into the desired region regardless of unknown environments. For faster control results, the disturbance term d needs to be effectively compensated by a proper control signal.

Additional neural network control signal
First of all, the disturbance d is modeled using the following Radial Basis Function (RBF) network: where W is the optimal weight vector, ξ(q, _ q) is the regression vector, and δ is the modeling error.
Based on the neural network model (6), the control signal (4) is modified by adding an additional intelligent control term, as follows: where u PID and u NN stand control terms generated by PID and neural network structure, respectively, andŴ is estimate of the weight vector W. The estimationŴ is updated by the following non-linear mechanism: where α w and β w are learning rates.
Remark 4: The system (8) uses rich information including timederivative, linear, and integral function of the control error to activate the learning process. The weight matrix of the neural network is automatically updated to ensure the minimum control error.

Stability analysis
In this section, we discuss the stability of the closed-loop system to ensure reliability of the proposed controller for the robotic system (3). From the above design, we have the following statements.
Theorem 1: Give a task-space model (3) of robotic manipulators, if employing a conventional neural PID control signal (7) supported by adaptive rules (5) and (8), the following properties hold: 1) The control error e, activation gain k 0 and the neural weight vectors are bounded. 2) In the stationary phase, the control error e converges to zero.

Proof:
We first synthesize a virtual control error (e v ) as follows: The time derivative of the new error (e v ) under dynamics (3) and the model (6) By substituting the control signal Eq. 7 and the gain structure Eq. 5 into the dynamics Eq. 10, we have a simpler form: whereW Ŵ − W is estimation error of the neural weight matrix W. We now consider a new Lyapunov function: Differentiating the function Eq. 12 with respect to time and noting the dynamics Eq. 11 lead to Applying Cauchy-Schwarz inequality, we obtain the following result: Frontiers in Robotics and AI frontiersin.org where Δ is a lumped term defined as Since w i and δ are bounded, hence Δ is bounded as well. This discussion leads to the proof of the first statement of Theorem 1.
In the stationary phase, the time derivative of the virtual control error e v converges zeros. By differentiating Eq. 9 with respect to time and applying Hurwitz criterion on the results, we can achieve the second proof of Theorem 1.
Remark 5: As carefully observing on the definition (15), one could select K 2 and β wi to large enough to reduce the disturbance bound Δ. However, these are still fixed values. From Eq. 14, it can be seen that the control performance could be enhanced by the learning gain k 0 for various working cases. The control idea is graphically summarized in Figure 1A. The following implementation procedure could be referred for deploying the proposed control algorithm on simulation or real-time testing. 1) In the first step, all of the learning rates (α 0 , β 0 , α w and β w ) are set to be zeros. The positive core gains (K 1 , K 2 ) are manually tuned for acceptable control performances. The gain K 2 are recommended to be greater than the gain K 1 . 2) In the second step, the learning rates (α 0 and β 0 ) of the activation gain (K 0 ) are adjusted to further enhance the control performance. In this step, the core gains (K 1 , K 2 ) could be retuned in some cases for higher control precision. 3) In the third step, the regression vector ξ(q, _ q) is built and the learning rates (α w and β w ) of the neural network are manually selected bring the control accuracy to a higher level. The whole tuning procedure could be applied several times for seeking an excellent control outcome. Note that, from the second turn, it does not need to reset the learning rates (α 0 , β 0 , α w and β w ) to be zeros anymore.

Validation results
This section presents validation results of the proposed controller in simulations. The control algorithm was applied to a 2-degree-of-freedom (DOF) robot, as sketched in Figure 1B. The manipulator was modeled as two rigid links with lengths of l 1 and l 2 . The mass was distributed at the end of each link (m 1 , m 2 ). The robot would work in a vertical plane with downward gravitational acceleration. Viscous friction was modeled at the joints (a 1 , a 2 ). Although this robot is quite simple, it contains all the necessary components of a general multi-degree of freedom manipulator including moment of inertia, centrifugal terms, Coriolis terms, gravity terms and friction effects.
To estimate the disturbances d, we used an RBF neural network with 4 input neurons, 256 hidden neurons and 2 output neurons. Frontiers in Robotics and AI frontiersin.org The actual values of the length of links, mass and viscous friction coefficients were chosen as follows: l 1 0.2; l 2 0.3; m 1 7; m 2 3.5; a 1 3; m 2 10 To evaluate the adaptability and robustness of the controller under divergent working conditions, we compared the proposed controller (called anPID) with a conventional PID controller (referred to as cPID) and an adaptive PID controller with using only automatic tuning law for PID gains (referred to as aPID). The parameters of the controller were chosen as: To carefully express the performance of the proposed controller, the robotic manipulators were simulated in three cases. In the first simulation, the robot was controlled to track the desired trajectories of smooth multi-step signals. Furthermore, process disturbances in the form of white noises, as shown in Figure 2C, were added to the output torques of the actuators. Simulation results of the conventional and intelligent PID controllers for the tracking control mission are also shown in Figure 2. Figures 2A,B shows that the proposed controller maintained good control errors even though the end-effector of the manipulator worked throughout a singularity point of (0.1; 0) (m). Figure 2D exhibits the control signals of the smart PID controller which had large values at the initial and singularity points in order to decrease the control errors as fast and much as possible. This superior property was the achievement of the learning laws (5) and (8) that are demonstrated by the gain and weight variations as depicted in Figures 2E, F, respectively. These terms were first started from the zero value, then their values had a large overshoot to bring the system to the steady state rapidly. It can be seen that the system adapted to the reasonable approximation of the disturbances to bring the control error to the smallest possible value. Therefore, the learning ability of the system has been confirmed with Frontiers in Robotics and AI frontiersin.org uncertain non-linearities and perturbations through this simulation validation. The manipulator was employed to draw a circle whose radius was 0.15 m and origin was at a point of (0.3; 0) (m) with a frequency of 1 Hz in the second simulation. The reference input used is shown in Figure 3A.
With the application of the neural flexible PID controller for unknown environments but using the adaptive rule (7), the control results obtained are presented in Figure 3B. From the data in this figure, although disturbances were not known in advance, the control qualities of the joints were good at both the transient and steady-state phases. The results were achieved thanks to the learning characteristics of the PID gains and the designed RBF neural network. There was a little overshoot in the y-direction error due to the large learning rate selected, but this overshoot might cause the system to quickly reach steady state. From the comparison of the control data in Figure 3B, it can be seen that the quality of proposed controller (anPID) was better than that of the aPID controller which was employed only one learning law (5). This is possible because the more adaptive terms the controller had, the more approximation with disturbances it gained.
In the third simulation, the end effector of robot manipulator was controlled to move from a point of (0.35; 0.25) (m) to another point of (0.15; 0.05) (m). After applying the three controllers for this mission in a free condition in which the desired trajectory was planned as a straight line, their control outcomes including the actual outputs and the control errors were illustrated in Figures 4A, B, respectively. In these figures, although the proposed controller (anPID) had more oscillation in the transient state to find adaptive term quickly, it had smallest overshoot and steady state error when compared with cPID and aPID controllers.
To further challenge the controllers with a more difficult working condition, an obstacle was set on the moving trajectory of the robot in the task space. By applying the trajectory planning method and the referred avoidance collision method (Borenstein and Koren, 1991), (Craig, 2005), the desired trajectory was generated as a curve by using two third-order-segment polynomials for the position, velocity and acceleration of the end-effector. The control data in this case are shown in Figures 4C, D. From the comparison of the data in these figures, it can be seen that the control quality of proposed controller (anPID) was better than that of the others (aPID and cPID) even though with the non-linear trajectory generated. Table 1 described the maximum absolute (MA) and root-meansquare (RMS) values of the control performances for a specified manipulated time (20 s-25 s). The proposed controller always provided the best MA and RMS error in all cases. These results show that the proposed control technology compensated efficiently for the non-linear uncertainties and unknown disturbances. Here, the advantages of the proposed controller have been confirmed. Therefore, the simulation results have proved that the studied control method outperform over the previous ones.

Conclusion
In this paper, an intelligent controller is proposed to optimize the position control performance of a 2DOF Frontiers in Robotics and AI frontiersin.org robotic manipulator. The controller is developed based on a conventional PID structure. New advanced features designed for disturbance learning and gain adaptation are then integrated into the ordinary control signal to improve its robustness and result in high control accuracies. The control efficiency of the proposed approach was then successfully verified by theoretic proof and comparative simulations. It can confirm that the controller is modelfree, simple, robust and flexible. In the near future, the proposed control algorithm will be integrated with an additional control term that could result in asymptotic control performances for dynamical trajectories. Furthermore, advanced path-planning and obstacleavoidance algorithms will be considered to combine with the controller to increase the flexibility when the system works in complex environments.

Data availability statement
The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.