Safe Robot Trajectory Control Using Probabilistic Movement Primitives and Control Barrier Functions

In this paper, we present a novel means of control design for probabilistic movement primitives (ProMPs). Our proposed approach makes use of control barrier functions and control Lyapunov functions defined by a ProMP distribution. Thus, a robot may move along a trajectory within the distribution while guaranteeing that the system state never leaves more than a desired distance from the distribution mean. The control employs feedback linearization to handle nonlinearities in the system dynamics and real-time quadratic programming to ensure a solution exists that satisfies all safety constraints while minimizing control effort. Furthermore, we highlight how the proposed method may allow a designer to emphasize certain safety objectives that are more important than the others. A series of simulations and experiments demonstrate the efficacy of our approach and show it can run in real time.


INTRODUCTION
The idea of proximity between robots and humans performing useful tasks, in a shared work space, inevitably brings up the issue of safety. In fact, safety is a limiting factor for the development of the autonomous robotic partners (Vicentini, 2021). Furthermore, strict safety requirements pose a major challenge for system integrators and robotics applications designers. When humans and robots share a physical work environment, robots must have control laws that make them verifiably safe around humans. In particular, robot systems need to be capable of detecting task variations, and their motion planning and control algorithms must be flexible enough to allow for variation while guaranteeing safety (Kragic et al., 2018).
Robot motion planning is a rich field of study providing myriad approaches to determine robot trajectories, including in the presence of obstacles (Mohanan and Salgoankar, 2018). Many approaches, such as rapidly exploring random trees (LaValle, 1998;LaValle et al., 2001), probabilistic roadmaps (Hsu et al., 1998;Geraerts and Overmars, 2004), and artificial potential fields (Warren, 1989;Vadakkepat et al., 2000) require predefined static maps for peak performance. In addition, robot motion planning algorithms often require expert design of cost functions, potential functions, and random sampling that are outside the expertise of day-to-day users. The learning from demonstration paradigm can address these shortcomings (Argall et al., 2009;Chernova and Thomaz, 2014) by leveraging the inherent expertise of a human teacher.
Movement primitives (MPs) are a popular approach to encode and generalize human demonstrations for training robots. MPs are modeled through a compact representation of the implicitly continuous and high-dimensional trajectories. For example, dynamic movement primitives (DMPs) use demonstrations to learn a model of the control effort necessary to produce a desired trajectory for a stabilized dynamic system (Ijspeert et al., 2013;Dahlin and Karayiannidis, 2019). Capturing the natural variation in human demonstration of a task can help a robot overcome uncertainty and deviation between the training regime and actual task execution (Gomez-Gonzalez et al., 2020). However, DMPs only encapsulate a single trajectory demonstration and thus lack this flexibility.
Probabilistic movement primitives (ProMPs) are a concept in which a distribution of trajectories is learned from multiple demonstrations. There are several works that have focused on the construction of ProMP controllers. In (Paraschos et al., 2018a), the design of a stochastic ProMP feedback controller was studied by exploiting the property of the covariance derivatives which can be explicitly computed. A model-free ProMP controller that adapts movement to force-torque input was designed in (Paraschos et al., 2018b). In (Calinon and Lee, 2017), the authors designed a model predictive control-based ProMP controller for a linear discrete time system model.
While they have prominent advantages, ProMP methods still present notable shortcomings. For example, prior ProMP approaches require a linearized model of the system in the controller design. This makes the controller less relevant for nonlinear systems such as robotics and autonomous vehicles. Additionally, while ProMPs themselves are fairly simple to generate, their controllers are difficult to implement, vulnerable to noise, sensitive to design parameters, and are variable to initial conditions. These factors limit the ability of non-experts to employ or tune such controllers. Finally, ProMPs by definition are stochastic and distributions of trajectories defined by Gaussian functions have a large support. Thus, the resulting trajectories can deviate far from the mean of the training set.
In recent years, real-time safety-critical control of dynamic systems has received notable consideration (Wieland and Allgöwer, 2007). One important approach is the use of barrier certificates/functions, which leverage off-line iterative optimization algorithms to verify safety for a given dynamical system (Sloth et al., 2012). The notion has been extended to synthesizing safe control laws in real-time using quadratic programming (QP) to find control inputs that satisfy control barrier functions (CBFs) (Ames et al., 2016).
A powerful property of CBFs is that they are easily combined with control Lyapunov functions (CLFs) in the same QP such that the resulting controller guarantees stability while respecting limits and safe regions of the state space (Ames et al., 2019;Cortez et al., 2019;Lopez et al., 2020). Additionally, the QP solved to find a safe, stable control input during run-time can include other optimization terms such as minimizing control effort. Other tasks formulated as cost functions or constraints can be included as well. CBF and CLF based controls have their own downsides, most notably the advanced knowledge necessary to define the barriers and trajectories. Efforts to automate the definition of CBFs and CLFs include mapping temporal logic statements to barriers and trajectories (Srinivasan and Coogan, 2020) and training piece-wise barrier functions for obstacles in the workspace (Saveriano and Lee, 2019).
This work addresses the aforementioned weaknesses of ProMPs and CBFs/CLFs. In our presented approach, the trajectory distribution provided by a ProMP is used to define a CLF and one or more CBFs. Specifically, the ProMP mean is used to define a CLF, and barriers for the CBFs are defined using the standard deviation of the distribution. Thus, the CLF and CBFs are established strictly through human demonstration eschewing the need for advanced control knowledge. The system will roughly track the mean trajectory, with a modicum of freedom to optimize the control effort or other requirements, while guaranteeing that the system never leaves a known neighborhood of the mean.
Since CLF and CBF controllers are intrinsically based on the nonlinear model of the system, our approach overcomes the inherent linearity of ProMP controllers. We demonstrate the effectiveness and computational efficiency of our approach through simulations and experiments with a two-link and a six-link robot. Examples of generated control trajectories by our method for a Universal Robots UR5 are shown in Figure 1. In summary, the salient contributions of this paper are the following.
• We develop a novel means of automating the design of CLFs and CBFs from the distribution delivered by a ProMP. • We introduce a new control design for ProMPs by combining CLFs and CBFs. • We demonstrate the practical applicability of the proposed method through experimental validation on a Universal Robots UR5e.
The remainder of this paper is structured as follows. In Section 2, we review ProMPs, CBFs, CLFs, and robot system dynamics. We detail our approach for a ProMP to define the CLFs and CBFs for optimal control in Section 3. Simulation and experiments are presented in Section 4. Lastly, we conclude with a discussion on future work in Section 5.

BACKGROUND
This section presents necessary background information on ProMPs, CBFs, and CLFs.
Notation: Given a matrix A, we denote its transpose by A ⊤ . Let the identity and zero matrices, with appropriate dimensions, be denoted by I and 0, respectively. We denote + as the symmetric entries of a matrix. For a vector field f i (x) and vector of vector fields F(x) = [f 1 (x), . . ., f n (x)], let L f i and L F denote, respectively, the Lie derivative along f i (x) and the vector of Lie derivatives in the directions f i (x): L F [L f1 , . . . , L fn ]. A continuous function β 1 : [0, a) → [0, ∞), for some a > 0, is said to belong to class K if it is strictly increasing and β 1 (0) = 0. The number of joints for a robot arm is represented by n, and a zero-mean i. i.d. Gaussian distribution with mean m and (co)variance Σ is denoted N (m, Σ).

Probabilistic Movement Primitives
ProMPs provide a parametric representation of trajectories which can be executed in multiple ways through the use of a probability distribution. A set of basis functions are used to reduce the model parameters and aid learning over the demonstrated trajectories. The trajectory distribution can be defined and generated in any space that accommodates the system (e.g., joint space or task space) (Paraschos et al., 2018a). In this work we consider joint space trajectories and assume the demonstrations to be normally distributed.
Let q i (t) ∈ R be the ith state variable. Then q i (k) ∈ R is q i (t) sampled at time k, where k ∈ {t 1 , . . . , t K } is a discrete set of sampling times. Within a ProMP, the execution of a trajectory is modeled as the set of robot positions, ζ i = {q i (k)}. Let w i ∈ R 1×L be a weight matrix with L terms. A linear basis function model is then given by is the time-dependent basis function matrix and L is the number of basis functions. Gaussian noise is described by ξ x i~N (0, Σ xi ). Thus, the ProMP trajectory is represented by a Gaussian distribution over the weight vector w i and the parameter vector θ i {μ w i , Σ wi }, which simplifies the estimation of the parameters.
We marginalize out w i to create the trajectory distribution Here, the distribution p (ζ i , θ i ) defines a hierarchical Bayesian model over the trajectories ζ i (Paraschos et al., 2018a) and In an MP representation, the parameters of a single primitive must be easy to obtain from demonstrations. The distribution of the state p (x i (k); θ i ) is The trajectory can be generated from the ProMP distribution using w i , the basis function Φ(k), and (2). The basis function is chosen based on the type of robot movement which can be either rhythmic or stroke-based. From (Eq. 2), the meanμ i (k) ∈ R 2 of the ProMP trajectory at k is Φ(k)μ wi and the covariance Multiple demonstrations are needed to learn a distribution over w i . To train a ProMP we use a combination of radial basis and polynomial basis functions. From the demonstrations, the parameters θ i can be estimated using maximum likelihood estimation (Lazaric and Ghavamzadeh, 2010). However, when there are insufficient demonstrations this may result in unstable estimates of the ProMP parameters. Therefore, similar to (Gomez-Gonzalez et al., 2020), our method uses a regularization to estimate the ProMP distribution. We maximize θ i for the posterior distribution over the ProMP using expectation maximization, In addition, we make use of Normal-Inverse-Wishart as a prior distribution p (θ i ) to increase stability when training the ProMP parameters (Gomez-Gonzalez et al., 2020).

System Modeling
Consider the following control affine nonlinear system where x ∈ R n denotes the state, u ∈ R m is the control input, G [g 1 , . . . , g m ], and f: R n → R n and g i : R n → R n are locally Lipschitz vector fields. It is assumed that the system in (Eq. 4) is controllable.
The model (4) encompasses the dynamic model of robotic manipulators. We consider the following description of robot motion given by the general form by the Euler-Lagrange equations, where q ∈ R n are generalized coordinates of the robot, is a vector containing the Coriolis and gravity terms, and E ∈ R n×p is the actuation matrix that determines the way in which the inputs u actuate the system. In this work, we consider the system to be fully actuated (i.e., p = n), which is typical for robot manipulators. Then, the system description in (Eq. 5) may be converted to an ODE of the form in

Control Barrier Functions
Let C be a set for which we wish to verify that x(t) ∈ C, ∀t. Then, C defines a safe set. A smooth function h(x): R n → R is defined to encode a constraint on the state x of system. The constraint is satisfied if h(x) ≥ 0 and violated if h(x) < 0. Concretely, C is defined as Frontiers in Robotics and AI | www.frontiersin.org March 2022 | Volume 9 | Article 772228 where Int(C) and zC denote the interior and boundary of C, respectively. Existing approaches to define CBFs include exponential CBFs, reciprocal CBFs, and zeroing CBFs (Ames et al., 2016;Nguyen and Sreenath, 2016). Yet, these methods have trade-offs with respect to ease of definition, boundedness of velocities, speed of convergence, etc. In this work we investigate the use of a reciprocal CBF. This type of CBF has a small value when the states are far from the constraints and it becomes unbounded when the states approach the constraints. Definition 1.  Given C and h, a function B: C → R is a CBF if there exists class K functions α 1 , α 2 , and α 3 , and a constant scalar γ > 0 such that Remark 1. It is worth noting that based on the definition of the safe set (7), if the initial state of the system is inside the safe set (i.e., h(x 0 ) > 0 when the system's trajectory gets close to the safety boundary) then the CBF condition forces the systems' trajectories to go back inside the safe set. This is due to the fact that the derivative of h(x(t)) is negative on the boundary which leads the value of B(x) (h(x)) to start decreasing (increasing). Moreover, the constant value γ determines how fast the states of the system can reach the safety boundary.

Control Lyapunov Functions
CLFs can be used to model and design dynamical control system inputs to ensure objectives such as stability, convergence to the origin (or other set point), or convergence to a desired trajectory. In order to have a construction similar to CBFs, we consider exponentially stabilizing CLFs .
Definition 2. In a domain X ⊂ R n , a continuously differentiable function V: X → R is an exponentially stabilizing CLF (ES-CLF) if ∀x ∈ X there exists positive scalar constants c 1 , c 2 , c 3 > 0 such that Having established a CBF to accomplish safety and a CLF to achieve control performance objectives, the two may be unified through a QP. As a result, safe control laws can be computed using the QP to solve the constrained optimization problems at each point in time (Ames et al., 2016).

CONTROL DEVELOPMENT
Our main goal is to design a controller such that the system output tracks a trajectory within the distribution generated by a ProMP. To this end, we first construct a nonlinear inner-loop control law based on the feedback linearization of (Eq. 5). Then, an outer-loop controller established by a CLF-CBF is designed using the distribution parametersμ i and Σ i . We summarize this process as the following problem objectives.
1. Use the demonstrated trajectories of a robot to train and estimate a ProMP distribution. The ProMP provides the timevarying mean and variance of a trajectory. 2. Develop a feedback linearization controller to obtain a linear and decoupled input-output closed-loop relationship for the error signal. 3. Design a CLF to stabilize the system such that ∀i, The general structure of our proposed system is shown in Figure 2.

Feedback Linearization Controller
First, we define the trajectory and error vectors as μ [μ 1 , . . . , μ n ] ⊤ and e [e 1 , . . . , e n ] ⊤ , respectively. Using (Eq. 4) and taking the derivative two times along f(x) andG(x), we obtain Next, assume that the decoupling matrix Γ is well-defined and has full rank (Hsu et al., 2015) 1 . This implies that the system in (Eq. 4) is feedback linearizable and we can prescribe the following control law, FIGURE 2 | The overall structure of the proposed system.
1 A control designer would confirm that the system has the same number of inputs as outputs and verify that e satisfies a vector relative degree (Kolavennu et al., 2001) condition, typically vector relative degree 2. This implies that the decoupling matrix Γ is well-defined and nonsingular. where v is an auxiliary feedback control value. This yields the second order linear system from input v to output e, € e v.
By defining η [e, _ e] ⊤ , (Eq. 12) can be written as a linear time invariant system From there, n decoupled systems can be obtained from (Eq. 13), where It should be noted that the derivative of the tracking error in (Eq. 14) can be obtained from _ q and _ μ. Indeed, we can measure _ q and we know μ from the ProMP. Since μ is a stored trajectory there is no noise in this signal. Therefore, _ μ can be calculated via backwards difference and it can be made as smooth as necessary (i.e., by creating spline curve from fit points in μ). Moreover, the backward difference method is again used for approximating € μ, in (Eq. 10) and (Eq. 11).
To accomplish problem objective 3, it is sufficient to ensure e i → 0. This is accomplished by designing an appropriate CLF. To satisfy problem objective 4, it is adequate to make |e i | < σ i . This objective is satisfied by defining suitable CBFs. In the following subsections, the CLFs and CBFs are defined for the system in (Eq. 14). Moreover, the ith controller for each system in (Eq. 14) is designed by combining the corresponding CLFs and CBFs via a QP problem.

Control Lyapunov Function
Consider the following rapidly exponentially stabilizing-CLF (RES-CLF) (Zhao et al., 2014), where ϵ i is a positive scalar and P ∈ R 2×2 is a symmetric positive definite matrix that can be obtained by solving the continuous time algebraic Riccati equation In order to exponentially stabilize the system, we want to find v i such that where c 3i is a positive constant value. To guarantee a feasible solution for the QP, the CLF constraint can be relaxed by δ i > 0  resulting in This relaxation parameter will be minimized in the QP cost function. It is worth mentioning that by providing a weighting factor on the relaxation parameter δ i , the QP can prioritize how close the system should track of a specific trajectory while ensuring that safety is always satisfied.

Control Barrier Functions
We propose two safety constraints for each system in (Eq. 14).
More specifically, each system should satisfy − σ i < e i < σ i . Consequently, we have the following two safety constraints, In this work, we assume that the initial conditions satisfy the safety constraints, i.e., the initial tracking errors are in the safe region. From (Eq. 19), it is clear that we have multiple time-varying constraints that should be satisfied simultaneously. Moreover, it is trivial to verify that L G e i = 0 and L F L G e i ≠ 0, thus the safety constraint has a relative degree of 2. For relative degree-two constraints, the reciprocal CBF is defined as (Hsu et al., 2015), where j ∈ {1, 2} and a Eij , b Eij are positive scalars. The following control barrier condition should be satisfied for time varying constraints which leads to time varying CBFs, Remark 2. Note that by choosing small values for a Eij and b Eij , the system will stop far from the constraint surfaces. On the other hand, by choosing large parameters the system will stop close to the constraints. In some cases, especially in the presence of uncertainties, choosing a Eij and b Eij to be too large may cause constraint violations (i.e., no solution exists to the QP problem). As a result, based on the given application, a compromise must be considered for choosing these parameters.

Quadratic Program
As shown in (Eq. 19), two safety constraints need to be satisfied simultaneously for each linearized, decoupled system. Due to this fact, a single controller can be obtained in such a way that guarantees adherence to both constraints (Rauscher et al., 2016). In this subsection, n QPs are proposed to unify RES-CLF and CBFs for each system in (Eq. 14) into a single controller. The n QPs for i ∈ {1, . . . , n} are defined as where H i 1 0 0 p sci and p sci ∈ R + is a variable that can be chosen based on the designer's assessment of weighting the control inputs. Based on the QP problem, if the system states Frontiers in Robotics and AI | www.frontiersin.org March 2022 | Volume 9 | Article 772228 η i are far away from the boundary of the safe set, then the control objective that is represented by RES-CLF will be satisfied. However, as the states get close to the boundary the control performance will be violated by the CBF.
Remark 3. If feedback linearization is not feasible, or in the case that feedback linearization does not result in independent systems, we should encode the coupling between the joints to train a single ProMP for the multidimensional system. Our proposed approach can be extended to address such cases by defining the safety constraints based on the entire ProMP covariance matrix, not just the diagonal elements. This is an avenue of future exploration.

SIMULATIONS AND EXPERIMENTS
In this section, we demonstrate different aspects and capabilities of our methodology for the ProMP control of a robotic system via simulations and experiments. The system models and proposed realtime controller were implemented in a MATLAB 2019a environment. All computations were run on a Dell OptiPlex 7050 machine with an Intel Core i7-7700X CPU and 8 GB of memory.

Case Study 1: Two-Link Robot
We consider a rigid, two-link robot with the dynamic model of (Eq. 5) and the following parameters (Kolhe et al., 2013), D q m 1 l 2 1 + m 2 l 2 1 + l 2 2 + 2l 1 l 2 cos q 2 + m 2 l 2 2 + l 1 l 2 cos q 2 m 2 l 2 2 , C q, _ q −m 2 l 1 l 2 sin q 2 _ q 2 2 _ q 1 + _ q 2 m 2 l 1 l 2 _ q 2 1 sin q 2 , K q m 1 + m 2 ( ) gl 1 sin q 1 + m 2 gl 2 sin q 1 + q 2 m 2 gl 2 sin q 1 + q 2 , where m 1 and m 2 are the link masses, l 1 and l 2 are the lengths of the links, and g is the gravitational acceleration. For the simulations, the values of these variables are selected as m 1 = 1, m 2 = 1, l 1 = 1, l 2 = 1, and g = 9.8. We generated 50 trajectories that achieve a goal position from various starting positions while avoiding three obstacles. Using this dataset, we trained a ProMP with Algorithm 1 from (Gomez-Gonzalez et al., 2020). We used L = 2 basis functions consisting of five radial basis parameters. The results of the ProMP training are presented in Figure 3, where the 50 input trajectories are shown in red. The ProMP mean joint trajectories, μ i , are shown in dark green, and in a light-green fill we show μ i ± σ i . A visualization of the ProMP in the workspace, based on (Sakai et al., 2018), is displayed in Figure 4, where the black circles indicate the location of obstacles. The robot link positions over time are highlighted in red, with different colors representing key end-effector trajectories from the ProMP. The green trajectory is the mean of the ProMP distribution. The other four trajectories result from combinations of μ i ± σ i , i ∈ {1, 2}.
Three sets of simulations were conducted. In each simulation, the CLF parameters were selected as ϵ i = 0.1 and c 3i = 0.5. In scenario 1, greater priority was given to the CLF than CBF by choosing a high weight, i.e., p sc1 = p sc2 = 200. Moreover, the CBF design parameters were set to a E11 = a E12 = 20.1, a E21 = a E22 = 20, b E11 = b E12 = 1, b E21 = b E22 = 0.9, γ 1 = 10.1, and γ 2 = 9. In scenario 2, p sci was chosen as p sc1 = p sc2 = 0.02 which implies that less priority was given to the CLF in comparison with the CBF. Moreover, the CBF parameters in this scenario are the same as the first scenario. To show the effects of changing the CBF parameters a Eij , b Eij , and γ i , we consider another scenario. In scenario 3, a E11 = a E12 = 1.1, a E21 = a E22 = 1.1, b E11 = b E12 = 0.4, b E21 = b E22 = 0.5, γ 1 = 1.3, and γ 2 = 1.51, with p sci = 0.02 as in the second scenario. Consequently, the effects of changing the CBF parameters can be analyzed by comparing the second and third scenarios.
The simulation results are exhibited in Figure 5. In scenario 1, by choosing a large value for p sci (more priority given to the CLF than CBF), the system output remains close to the mean trajectory. However, in scenario 2, by considering a small value for p sci (more priority given to the CBF than CLF), the system remains safely inside the distribution but does not necessarily stay close to the mean. In scenario 3, it can be seen that by choosing smaller values for a Eij , b Eij , and γ i , the system output will maintain more distance from the constraint surfaces, resulting in remaining closer to the mean trajectory. In short, our proposed method provides a valuable option to the system designer by permitting fine-grained administration of the trajectories while ensuring safety.
The primary computational cost of our controller, with respect to time, comes from the fact that it must solve a set of QPs at every time step. In the evaluation simulations, two QP problems are solved in real time (one for each link). The average required time (T ave ), maximum time (T max ), and the standard deviation (std) for solving the QP problems are T ave = {0.0 015s, 0.0 011s}, T max = {0.1 148s, 0.0 119s}, std = {0.0 053s, 0.0 006s}. From these results, it is clear that the expected execution time of the QP problems is very small (e.g., in the range of 1 ms). The large maximum times correspond to instances of a single outlier. Hence, the controller is applicable for real-time implementation.

Comparison With Conventional ProMP Control
One specific aspect of the CLF/CBF-based ProMP controller developed in this work is minimizing the control effort in the optimization problem (Eq. 22) at each time step. This leads to a lower control effort in comparison with a traditional ProMP controller. Another set of simulations were conducted to compare our proposed methodology with the results of (Paraschos et al., 2018a;Mathew, 2018), which is representative as one of the primary works in this field. To provide a fair comparison with our method, the ProMP controller is also applied to the feedback linearizable model of the two-link robot.
We conducted three sets of simulations (designated as the fourth, fifth and sixth scenarios). In scenario 4, we implemented the CLF/CBF-based controller with more priority accorded to the CLF rather than the CBF (similar to scenario 1). In scenario 5, we considered the CLF/CBF-based controller with more priority bestowed to the CBF instead of the CLF (similar to scenario 2). The ProMPs in scenarios 4 and 5 were trained using the library Frontiers in Robotics and AI | www.frontiersin.org March 2022 | Volume 9 | Article 772228 6 in (Gomez-Gonzalez, 2019). However, we were not able to successfully implement the original controller in (Gomez-Gonzalez, 2019). Therefore, in scenario 6, we trained and implemented the traditional ProMP controller presented in (Paraschos et al., 2018a) using the library presented in (Mathew, 2018). While the ProMPs for our CLF/CBF-based ProMP controller and traditional ProMP controller used the same training set, the resulting ProMP mean and covariance are slightly different. For each scenario, the simulations were run for 100 different initial conditions that were randomly selected within the bound [μ 1 ± 0.12, μ 2 ± 0.12], where μ 1 (0) = − 1.292 5, and μ 2 (0) = 0.6.
The simulation results corresponding to these three scenarios are, respectively, shown in Figures 6-8, where the mean of ProMP is highlighted by a dashed orange line, and the mean ± variance bounds are shown with dotted black lines. These results show that all the controllers are robust against uncertainties in the initial conditions. From Figure 6, it can be concluded that by using the CLF/CBF-based controller with high p sci , the system quickly converges and tracks the mean trajectory. The system deviates from the mean to take a shorter path in Figure 7, but it is clear that by considering small values for p sci the system remains safely inside the distribution. Based on Figure 8, it can be seen that when using a traditional ProMP controller, the system tracks the mean with an error larger than scenario 4. Moreover, the second joint does not remain inside the distribution during certain periods. We posit that the ProMP controller has some lag, akin to a PID controller with proportional gains that are too low. The feedback and feedforward gains of the ProMP controller are determined as functions of the system parameters and ProMP parameters, and they cannot be tuned to reduce tracking error. The ProMP can be "tuned" through the use of via points. To this end, we have added a via point as the last element of the mean trajectory to ensure convergence. These results show that the CLF/CBF-based ProMP controller has better tracking performance when compared to a traditional ProMP controller. Figure 9 and Figure 10 summarize the root mean square (RMS) values of the control variables for scenarios 4, 5, and 6 in the form of a boxplot. In these figures, 50% of all the RMS values are placed in the boxes and the median is shown by a red line that divides the box into two parts. The black bars indicate the maximum and minimum values, and the dashed "whiskers" indicate 25% of the values between the box and max/min. Outliers, if any, are indicated by red crosses. From Figure 9, it can be concluded that scenario 4 has larger control values in comparison to scenario 5, i.e., more control effort is needed to  Frontiers in Robotics and AI | www.frontiersin.org March 2022 | Volume 9 | Article 772228 be able to effectively track the ProMP mean. This indicates one strength of leveraging trajectory distributions over a single trajectory; the system is given more freedom to reduce the control effort while maintaining safety. Moreover, in contrast with a traditional ProMP controller our methodology has a remarkably lower control effort as shown in Figure 9 and Figure 10.

Case Study 2: Universal Robots UR5 Six-Link Robot
The equation of motion of the UR5 robot can be written in the form of (Eq. 5), with the following parameters (Spong and Vidyasagar, 2008), where m i ∈ R is the mass of ith link, and J vi ∈ R 3×6 and J wi ∈ R 3×6 are the linear and angular part of the Jacobian matrix J i , respectively. R i ∈ R 3×3 is the rotation matrix and I m i ∈ R 3×3 is the inertia tensor.
The elements of C(q, _ q) are obtained from the inertia matrix as where m ij are the entries of the inertia matrix. Moreover, the elements of the gravity vector are obtained from where P is the total potential energy of the robot. Additional information on these equations can be found in (Katharina, 2014). FIGURE 6 | The results of the CLF/CBF-based ProMP controller with more priority given to the CLF than the CBF for the first joint (left) and second joint (right).
Frontiers in Robotics and AI | www.frontiersin.org March 2022 | Volume 9 | Article 772228 We generated 90 joint-space trajectories with defined goals, obstacles, and starting positions. The 90 UR5 trajectories were then used to train a joint-space ProMP using the same parameters as in the two-link robot case study. The following set of CLF and CBF parameters were chosen: ϵ 1 = ϵ 2 = ϵ 3 = ϵ 4 = ϵ 5 = 0.1, ϵ 6 = 0.01, and c 3i = 1.1, a Eij = 20.1, b Eij = 1, and γ i = 10.1, i ∈ {1, . . . , 6}, j ∈ {1, 2}. The simulation environment is depicted in Figure 1 and Figure 11. We consider two different scenarios. In scenario 1, p sci = 200, which gives higher importance to the CLF. For scenario 2, p sci = 0.001, which implies that the design interest and priority is on the CBF. As is clear from Figure 11, in both scenarios the robot can effectively track the mean of ProMP and simultaneously avoid colliding with environmental obstacles.

Case Study 3: Universal Robots UR5e Six-Link Robot
To conclude this section, we evaluated our CLF/CBF-based ProMP controller using a physical robot with static obstacles. The robot used was a Universal Robots UR5e six-link manipulator with a Robotiq two-finger gripper. In addition to the robot, our environmental setup included the following three static obstacles: a robotic arm, box, and the table that the robot was mounted on.
We tasked a human teacher with demonstrating a pick-andplace procedure to the robot. In our setup, the robot must move from a starting bin to a goal bin located on the other side of a box. We installed a gravity compensation controller on the robot enabling the teacher to directly affect the robot's joints via kinesthetic teaching. Using this directed learning from FIGURE 7 | The results of the CLF/CBF-based ProMP controller with more priority given to the CBF than the CLF for the first joint (left) and second joint (right). Frontiers in Robotics and AI | www.frontiersin.org March 2022 | Volume 9 | Article 772228 demonstration approach allows us to bypass the correspondence problem (Argall et al., 2009). Additionally, kinesthetic teaching retains parity between the demonstration, learning, and execution space of the trajectories. In all, seven demonstrations of the task were conducted. From these demonstrations, we collected the joint angles, velocities, and the associated time steps. It is worth mentioning that care must be taken to collect data that is roughly Gaussian, or at least unimodal. Methods to handle non-Gaussian or multimodal data is an issue to be addressed in future work. Using this demonstration data, we trained a joint-space ProMP using the same parameters as described in the previous sections. In Figure 12, we see the demonstration data (in green), the respective ProMP mean (red), and one standard deviation of the ProMP (light red). Note that q 6 , which corresponds to the final wrist joint, was not purposefully actuated by the teacher and is largely static during training. We implemented the CLF/CBF-based ProMP controller with p sci = 0.5. The other parameters were selected to be the same as the previous simulations in Section 4.3. Figure 13 depicts the FIGURE 10 | The RMS values of control inputs corresponding to a traditional ProMP controller for the first joint (left) and second joint (right).  Frontiers in Robotics and AI | www.frontiersin.org March 2022 | Volume 9 | Article 772228 FIGURE 12 | The UR5e robot joint trajectories recorded from the human demonstration of the pick-and-place task are shown in green, while the ProMP mean trajectory and one standard deviation of the ProMP are colored red and light red, respectively.
FIGURE 13 | A human teacher demonstrating the pick-and-place task to a UR5e robot.
Frontiers in Robotics and AI | www.frontiersin.org March 2022 | Volume 9 | Article 772228 11 demonstration operation as the human teacher moves the robot arm during the task. The trajectory from the CLF/CBF-based ProMP controller on the UR5e robot is shown in Figure 14.

CONCLUSION AND FUTURE WORK
In this work we solved a ProMP robot guidance problem using a CLF/CBF-based controller. Our approach stabilizes the robot and guarantees that the system output is always inside the distribution generated by a ProMP. The timevarying nature of the ProMP ensures the robot is guided along the distribution at the desired rate. Moreover, our technique allows for prioritizing between strict tracking of ProMP mean and loose but safe tracking of mean trajectory, which is not possible in the native ProMP control design. It was shown, in Section 4.2, that this can reduce control effort at the risk of getting closer to barriers. Simulation and experimental studies on a two-link and six-link robot confirm the viability of our method for designing the controller.
As part of ongoing work, we are investigating the trade-offs of various different CBFs (e.g., zeroing versus reciprocal), other choices of cost functions, and constraints in the QP. Additionally, we are seeking novel methods that automatically define additional barriers to ensure the safe movement of a corobot around dynamic obstacles (e.g., humans and other robots). This includes the exploration of an active learning framework whereby a co-robot has the capability to select informative trajectories from the ProMP distribution that can then be used for autonomously retraining itself.

DATA AVAILABILITY STATEMENT
The data used in this research includes simulation and experimental results. The authors will provide it upon reasonable request.

AUTHOR CONTRIBUTIONS
MD lead theoretical development and implementation of CBF/ CLF controller. AI lead implementation of ProMP controllers. JC implemented ProMP algorithms and conducted simulations and experiments. WB lead theoretical developments of ProMP and contributed to CLF and CBF design. NG directed the research overall and lead theoretical design of nonlinear control theory. All authors contributed to writing and editing the manuscript and contributed to the theoretical development and analysis of experimental and simulation results.