Adaptive and Energy-Efficient Optimal Control in CPGs Through Tegotae-Based Feedback

To obtain biologically inspired robotic control, the architecture of central pattern generators (CPGs) has been extensively adopted to generate periodic patterns for locomotor control. This is attributed to the interesting properties of nonlinear oscillators. Although sensory feedback in CPGs is not necessary for the generation of patterns, it plays a central role in guaranteeing adaptivity to environmental conditions. Nonetheless, its inclusion significantly modifies the dynamics of the CPG architecture, which often leads to bifurcations. For instance, the force feedback can be exploited to derive information regarding the state of the system. In particular, the Tegotae approach can be adopted by coupling proprioceptive information with the state of the oscillation itself in the CPG model. This paper discusses this policy with respect to other types of feedback; it provides higher adaptivity and an optimal energy efficiency for reflex-like actuation. We believe this is the first attempt to analyse the optimal energy efficiency along with the adaptivity of the Tegotae approach.


Single Shoothing Method
The single shooting (SS) method starts by discretizing the controls. We might choose grid points on the unit interval, 0 = τ 0 < τ 1 < · · · < τ N = 1, and then re-scale these grid-points to the possible variable time horizon of the optimal control problem [0, T ], by defining t i = T τ i for i = 0, . . . , N . On this grid, the controls u(t) are discretized, so that u(t) only depends on the many finite control parameters q = (q 0 , q 1 , . . . , q N −1 , T ) and it can be denoted by u (t, q). If the problem has a fixed horizon length T , the last component of q disappears since it is not an optimization variable. A numerical simulation routine can be used to solve the initial value problem.
we can now regard the states x(t) in [0, T ] as dependent variables. We can denote them by x (t, q). In terms of which simulation routine should be chosen is crucial to the success of any shooting method and it depends on the ODE model type. We also discretized the nonlinear constraints to avoid a semi-infinite problem. For example, by requiring h(x(t), u(t)) ≥ 0 only at the grid points t i , a finer grid can also be chosen without any problem. Thus, we can obtain the following finite dimensional NLP.
This problem is solved by a finite dimensional optimization solver, e.g., sequential quadratic programming, which is extensively described in Fagiano2019. The main points of the SS are as follows.
1. It can use fully adaptive, error controlled, state-of-the-art ODEs or DAEs solvers. 2. It only has a few optimization degrees of freedom, even for large ODEs or DAEs systems.
3. Only the initial guesses for the controlled degrees of freedom are needed.
On the other hand, the weak points are described as follows.
1. It is not possible to use the knowledge of the state trajectory x in the initialization (i.e. to track the problems).
2. The ODE solution x(t, q) depends very non-linearly on q.
3. Unstable systems are difficult to treat; in some cases, numerical issues may arise.
This can be partially solved by including a penalty term on the input variations in terms of the cost function, as suggested in Fagiano2019.

Multiple Shooting
The direct multiple shooting (MS) method tries to combine the advantages of parallel computing with the major advantage of the single shooting (SS) method, namely the possibility to use adaptive, errorcontrolled ODE solvers. In the MS method, the procedure is as follows. First, the control law is discretized piecewise on a coarse grid as follows.
where the intervals can be as large as in the SS. But second, the ODE is solved for each interval [t i , t i+1 ] independently, which starts with an artificial initial value s i .
By numerically simulating these initial value problems, the trajectory pieces x i (t, s i , q i ) are obtained. The extra arguments are introduced to denote the dependence on the interval's initial values and controls.
Simultaneously with the decoupled ODE solution, the following integral can also be computed as described below.
In order to constrain the artificial degrees of freedom s i to physically meaningful values, the following continuity conditions are imposed.
Thus, it has an NLP formulation that is equivalent to the SS NLP, but it contains the extra variables s i and it has a block sparse structure. E(s N ) is a cost function on the final state s N , which is different from the one used for the internal states.
If we summarize all of the variables as w ≜ (s 0 , q 0 , s 1 , q 1 , . . . , s N ), we can obtain a NLP in the form of a finite dimensional NLP. In terms of its advantages, the knowledge of the state trajectory can be used in the initialization, and it robustly handles the unstable systems, the path state, and the terminal constraints.
In particular, it is interesting to note that the terminal constraint is already satisfied in the first iteration, which is attributed to its linearity. But on the other hand, it needs to be tackled explicitly in the SS, which includes the cost function and some penalty terms for the terminal conditions in order to obtain a sensible trajectory, as suggested in Fagiano (2019). The nonlinear effects of the continuity conditions are distributed over the entire horizon, which is observed in the discontinuities. This is in contrast to the SS, where the non-linearity of the system is accumulated until the end of the horizon. In addition, the terminal constraint becomes more nonlinear than necessary. As stated above, the MS can combine adaptivity with the fixed NLP dimensions, by applying the adaptive ODE/DAE solvers. Within each sequential quadratic programming iteration, the ODE solution is often the most costly part, and it is easy to parallelize. The possibility of using efficient state-of-the-art ODE/DAE solvers and their inbuilt adaptivity makes the MS a competitive option. From a practical point of view, it offers the advantage that the user does not have to decide on the grid of the ODE discretization, but only for the control grid.

Design of Optimal Controllers
The construction of an optimal controller that is based on the previous methods is non-trivial if a switching system is taken into account. The main issue is passing the information of the switching dynamics and a switching controller to the cost function s minimizer. As a result, the following scheme has been used. First, the simulation with the Tegotae controller is performed until a steady state is reached. This is done by running the simulation for a sufficiently large time. Second, the peak values of the trajectory for the very last jump are taken. In case these correspond to the flight phase, the dynamics is further cut up into the stance phase for the period only. In fact, these dynamics are the reference dynamics with respect to the optimal controller that needs to be constructed. Afterwards, the flight phase does not need to be tackled by the optimal controller. This is because the control will not be effective in this section. In case if the flight phase is absent, the peak point is directly taken as a reference point. Subsequently, the values of the position and velocity are taken and the finite horizon optimal control (FHOC) problem is solved by using the MS method. The FHOC problem is solved with respect to these initial and final conditions, which are automatically added by the extended formulation. The cost function is constructed by taking into account Eq. 29 in the main text. The energy stored in the spring system may have a leading role. On the other hand, the maximum height in the vertical excursion is substituted by the length of the body due to the previous considerations about the relevance of the flight phase. The FHOC for the MS method is formulated by using the norm notation ∥x∥ 2 The second and third terms in Eq. 27 are the energy stored in the spring system and the vertical excursion in the stance phase, respectively. We empirically set these values, Q 1 , R 1 , L 1 via trial and error.  Figure S1. The results of multiple shooting methods. Case MS2 in Table 2: m = 0.1. The dotted blue and solid red lines represents the designed optimal controller (MS method) and Tegotae controller, respectively.  Table 2: m = 0.1. The dotted blue and solid red lines represents the designed optimal controller (MS method) and Tegotae controller, respectively.  Table 2: m = 0.1. The dotted blue and solid red lines represents the designed optimal controller (MS method) and Tegotae controller, respectively.  Figure S5. Results of the multiple shooting-single shooting. Case MS-SS4 in Table 3: m = 0.6. Frontiers 7