Prediction of Intention during Interaction with iCub with Probabilistic Movement Primitives

This paper describes our open-source software for predicting the intention of a user physically interacting with the humanoid robot iCub. Our goal is to allow the robot to infer the intention of the human partner during collaboration, by predicting the future intended trajectory: this capability is critical to design anticipatory behaviors that are crucial in human-robot collaborative scenarios, such as in co-manipulation, cooperative assembly or transportation. We propose an approach to endow the iCub with basic capabilities of intention recognition, based on Probabilistic Movement Primitives (ProMPs), a versatile method for representing, generalizing, and reproducing complex motor skills. The robot learns a set of motion primitives from several demonstrations, provided by the human via physical interaction. During training, we model the collaborative scenario using human demonstrations. During the reproduction of the collaborative task, we use the acquired knowledge to recognize the intention of the human partner. Using a few early observations of the state of the robot, we can not only infer the intention of the partner, but also complete the movement, even if the user breaks the physical interaction with the robot. We evaluate our approach in simulation and on the real iCub. In simulation, the iCub is driven by the user using the Geomagic Touch haptic device. In the real robot experiment, we directly interact with the iCub by grabbing and manually guiding the robot's arm. We realize two experiments on the real robot: one with simple reaching trajectories, and one inspired by collaborative object sorting. The software implementing our approach is open-source and available on the GitHub platform. Additionally, we provide tutorials and videos.


INTRODUCTION
A critical ability for robots to collaborate with humans is to predict the intention of the partner.
For example, a robot could help a human fold sheets, move furniture in a room, lift heavy objects, or place wind-shields on a car frame. In all these cases, the human could begin the collaborative movement by guiding the robot, or by leading the movement in the case that both human and robot hold the object. It would be beneficial for the performance of the task if the robot could infer the intention of the human as soon as possible, and collaborate to complete the task without requiring any further assistance. This scenario is particularly relevant for manufacturing [1], where robots could help human partners in carrying a heavy or unwieldy object, while humans could guide the robot without effort in executing the correct trajectory for positioning the object at the right location 1 . For example, the human could start moving the robot's end-effector towards the goal location, and release the grasp on the robot when the robot shows that it is capable of reaching the desired goal location without human intervention. Service and manufacturing scenarios offer a wide set of examples where collaborative actions can be initiated by the human and finished by the robot: assembling objects parts, sorting items in the correct bins or trays, welding, moving objects together, etc. In all these cases, the robot should be able to predict the goal of each action and the trajectory that the human partner wants to do for each action. To make this prediction, the robot should use all available information coming from sensor readings, past experiences (prior), human imitation and previous teaching sessions or collaborations. Understanding and modeling the human behavior, exploiting all the available information, is the key to tackle this problem [3].
To predict the human intention, the robot must identify the current task, predict the user's goal and predict the trajectory to achieve this goal. In the human-robot interaction literature, many keywords are associated to this prediction ability: inference, goal estimation, legibility, intention recognition, anticipation.
Anticipation is the ability of the robot to choose the right thing to do in a current situation [4].
To achieve this goal, the robot must predict the effect of their action, as studied with the concept of affordances [5,6,7]. It also must predict the human intention, which means estimating the partner's goal [8,9]. Finally, it must be able to predict the future events or states, e.g. being able to simulate the evolution of the coupled human-robot system, as it is frequently done in model predictive control [10,11] or in human-aware planning [12,13].
It has been posited that having legible motions [14,15] helps the interacting partners in increasing the mutual estimation of the partner's intention, increasing the efficiency of the collaboration.
Anticipation requires thus the ability to visualize or predict the future desired state, e.g., where the human intends to go to. Predicting the user intention is often formulated as predicting the target of the human action, meaning that the robot must be able to predict at least the goal of the human when the two partners engage in a joint reaching action. To make such prediction, a common approach is to consider each movement as an instance of a particular skill or goal-directed movement primitive.
In the past decade, several frameworks have been proposed to represent movements primitives, frequently called skills, the most notable being Gaussian Mixture Models (GMM) [16,17], Dynamic Movement Primitives (DMP) [18], Probabilistic Dynamic Movement Primitive (PDMP [19]) and Probabilistic Movement Primitives (ProMP) [20]. For a thorough review of the literature we refer the interested reader to [21]. Skill learning techniques have been applied to several learning scenarios, such as playing table-tennis, writing digits, avoiding obstacles during pick & place motions, etc. In all these scenarios, the humans are classically providing the demonstrations (i.e., realizations of the task trajectories) by either manually driving the robot or through tele-operation, following the classical paradigm of imitation learning. Some of them have been also applied to the iCub humanoid robot: for example, [22] used DMPs to adapt a reaching motion online to the variable obstacles encountered by the robot arm, while [23] used ProMPs to learn how to tilt a grate including torque information.
Among the aforementioned techniques, ProMPs stand out as one of the most promising techniques for realizing intention recognition and anticipatory movements for human-robot collaboration.
They have the advantage, with respect to the other methods, of capturing by design the variability of the human demonstrations. They also have useful structural properties, as described by [20], such as co-activation, coupling and temporal scaling. ProMPs have already been used in human-robot coordination for generating appropriate robot trajectories in response to initiated human trajectories [24]. Differently from DMPs, ProMPs do not need the information about the final goal of the trajectory, which is something that DMPs use to set an attractor that guarantees convergence to the final goal. 2 Also, they perform better in presence of noisy measurements or sparse measurements, as discussed in [25]. 3 In a recent paper [19] proposed a method called PDMP (Probabilistic Dynamic Movement Primitive). This method improves DMP with probabilistic properties to measure the likelihood that the movement primitive is executed correctly and to perform inference on sensor measurement. However, The PDMPs do not have a data-driven generalization and can deviate arbitrarily from the demonstrations. These last differences can be critical for our humanoid robot (for example, if it collides with something during the movement, or if during the movement it holds something that can fall down due to a bad trajectory, etc.). Thus, the ProMPs method is more suitable for our applications.
In this paper, we present our approach to the problem of predicting the intention during human-robot physical interaction and collaboration, based on Probabilistic Movement Primitives (ProMPs) [20], and we present the associated open-source software code that implements the method for the iCub.
To illustrate the technique, the exemplifying problem we tackle in this paper is to allow the robot to finish a movement initiated by the user that physically guides the robot arm. From the first observations of the joint movement, supposedly belonging to a movement primitive of some task, the robot must recognize which kind of task the human is doing, predict the "future" trajectory and complete the movement autonomously when the human releases the grasp on the robot. 4 To achieve this goal, the robot first learns the movement primitives associated to the different actions/tasks. We choose to describe these primitives with ProMPs, as they are able to capture the distribution of demonstrations in a probabilistic model, rather than with a unique "average" trajectory. During interaction, the human starts physically driving the robot to perform the 2 There may be applications where converging to a unique and precise goal could be a desirable property of the robot's movement. However, it is an assumption that prevents us to generalize the method for different actions, and this is another reason why we prefer ProMPs. 3 We refer the interested reader to [25] for a thorough comparison between DMPs and ProMPs to be used for interaction primitives and prediction. 4 To avoid disambiguation, in our method, tasks are encoded by primitives that are made of trajectories: this is a very classical approach for robot learning techniques and in general techniques based on primitives. Of course this is a simplification, but it allows representing a number of different tasks: pointing, reaching, grasping, gazing, etc. desired task. At the same time, the robot collects observations of the task. It then uses the prior information from the ProMP to compute a prediction of the desired goal together with the "future" trajectory that allows it to reach the goal.
A conceptual representation of the problem is shown in Figure 1. In the upper part of this figure, we represent the training step for one movement primitive: the robot is guided by the human partner to perform a certain task, and several entire demonstrations of the movement that realizes the task are collected. Both kinematics (e.g., Cartesian positions) and dynamics (e.g., wrenches) information are collected. The N trajectories constitute the base for learning the primitive, that is learning the parameters ω of the trajectory distribution. We call this learned distribution the prior distribution. If multiple tasks are to be considered, then the process is replicated such that we have one ProMP for every task. The bottom of the figure represents the inference step. From the early observations 5 of a movement initiated by the human partner, the robot first recognizes which ProMP best matches the early observations (i.e., it recognizes the primitives that the human is executing, among the set of known primitives). Then, it estimates the future trajectory, given the early observations (e.g. first portion of a movement) and the prior distribution, computing the parameters ω * of the posterior distribution. The corresponding trajectory can be used by the robot to autonomously finish the movement, without relying on the human.  Figure 1: Conceptual use of the ProMP for predicting the desired trajectory to be performed by the robot in a collaborative task. Top: training phase, where ProMPs are learned from several human demonstrations. Bottom: inference phase (online), where from early observations the robot recognizes the current (among the known) ProMP and predicts the human intention, i.e., the future evolution of the initiated trajectory.
In the paper, we describe both the theoretical framework and the software that is used to perform this prediction. The software is currently implemented in Matlab and C++; it is open-source, available on github: https://github.com/inria-larsen/icubLearningTrajectories and it has been tested both with a simulated iCub in Gazebo and the real iCub. In simulation, physical guidance is provided by the Geomagic Touch 6 ; on the real robot, the human operator simply grabs the robot's forearm. 5 In the paper, we denote by early observations the first portion of a movement observed by the robot, i.e., from t = 0 to a current t. 6 The Geomagic Touch is a haptic device, capable of providing force feedback from the simulation to the operator.
We also provide a practical example of the software that realizes the exemplifying problems. In the example, the recorded trajectory is composed of both the Cartesian position and the forces at the end-effector. Notably, in previous studies [23], ProMPs were used to learn movement primitives using joint positions. Here, we use Cartesian positions instead of joints positions, to exploit the redundancy of the robotic arm in performing the desired task in the 3D space. At the control level of the iCub, this choice requires the iCub to control its lower-level (joint torque) movement with the Cartesian controller [26] instead of using the direct control at joint level. As for the forces, we rely on a model-based dynamics estimation that exploits the 6 axis force/torque sensors [27,28].
All details for the experiments are presented in the paper and the software tutorial.
To summarize, the contributions of this paper are: • the description of a theoretical framework based on ProMPs for predicting the human desired trajectory and goal during physical human-robot interaction, providing the following features: recognition of the current task, estimation of the task duration, prediction of the future trajectory; • an experimental study about how multimodal information can be used to improve the estimation of the duration/speed of an initiated trajectory; • the open-source software to realize an intention recognition application with the iCub robot, both in simulation and on the real robot.
The paper is organized as follows. In Section 2 we review the literature about intentions in Human-Robot Interaction (HRI), probabilistic models for motion primitives and their related software. In Section 3 we describe the theoretical tools that we use to formalize the problem of predicting the intention of the human during interaction. Particularly, we describe the ProMPs and their use for predicting the evolution of a trajectory given early observations. In Section 4 we overview the software organization and the interconnection between our software and the iCub's main software, both for the real and simulated robot. The following sections are devoted to presenting our software and its use for predicting intention. We choose to present three examples of increasing complexity, with the simulated and real robot. We provide and explain in detail a software example for a 1-DOF trajectory in Section 5. In Sections 6 and 7 we present the intention recognition application with the simulated and real iCub, respectively. In the first examples with the robot, the "tasks" are exemplified by simple reaching movements, to provide simple and clear trajectories that help the reader understand the method, whereas the last experiment with the robot is a collaborative object sorting task. Section 8 provides the links to the videos showing how to use the software in simulation and on the iCub. Finally, in Section 10 we discuss our approach, its limitations and outline our future developments.

Related Work
In this paper we propose a method to recognize the intention of the human partner collaborating with the robot, formalized as the target and the "future" trajectory associated to a skill, modeled In our experiments with the simulated iCub we did not use this feature. We used the Geomagic Touch to steer the arm of the simulated robot. In that sense, we used it more as a joystick for moving the left arm.
by a goal-directed Probabilistic Movement Primitive. In this section, we briefly overview the literature about intention recognition in human-robot interaction and motion primitives for learning of goal-directed robotic skills.

Intention during human-robot interaction
When humans and robots collaborate, mutual understanding is paramount for the success of any shared task. Mutual understanding means that the human is aware of the robot's current task, status, goal, available information, that he/she can reasonably predict or expect what it will do next, and vice versa. Recognizing the intention is only one piece of the problem, but still plays a crucial part for providing anticipatory capabilities.
Formalizing intention can be a daunting task, as one may find it difficult to provide a unique representation that explains the intention for very low-level goal directed tasks (e.g., reaching a target object and grasping it) and for very high-level, complex, abstract or cognitive tasks (e.g., change a light bulb on the ceiling -by building a stair composed of many parts, climbing it and reaching the light bulb on the ceiling, etc.). [29] review different approaches of action recognition and intention prediction.
From the human's point of view, understanding the robot's intention means that the human should find intuitive and non-ambiguous every goal-directed robot movement or actions, and it should be clear what the robot is doing or going to do [30]. [31] formalized the difference between predictability and legibility: a motion is legible if an observer can quickly infer its goal, while a motion is predictable when it matches the expectations of the observer given its goal.
The problem of generating legible motions for robots has been addressed in many recent works.
For example, [31] use optimization techniques to generate movements that are predictable and legible. [32] apply an Inverse Reinforcement Learning method on autonomous cars to select the robot movements that are maximally informative for the humans and that will facilitate their inference of the robot's objectives.
From the robot's point of view, understanding the human's intention means that the robot should be able to decipher the ensemble of verbal and non-verbal cues that the human naturally generates with his/her behavior, to identify, for a current task and context, what is the human intention. The more information (e.g., measurable signals from the human and the environment) is used, the better and more complex the estimation can be.
The simplest form of intention recognition is to estimate the goal of the current action, under the implicit assumption that each action is a goal-directed movement. [33] showed that humans implicitly attribute intentions in form of goals to robot motions, proving that humans exhibit anticipatory gaze towards the intended goal. Gaze was also used by [34] in a human-robot interaction game with iCub, where the robot (human) was tracking the human (robot) gaze to identify the target object. [35] proposed the Bayesian Human Motion Intentionality Prediction algorithm, to geometrically compute the most likely target of the human motion, using Expectation-Maximisation and a simple Bayesian classifier. In [36], a method called Intention-Driven Dynamics model, based on Gaussian Process Dynamical Models (GPDM [37]), is used to infer the intention of the robot's partner during a ping-pong match, represented by the target of the ball, by analyzing the entire human movement before the human hits the ball.
More generally, modeling and descriptive approaches can be used to match predefined labels with measured data [38].
A more complex form of intention recognition is to estimate the future trajectory from the past observations. In a sense, to estimate [x t+1 , . . . , x t+T f uture ] = f (x t , x t−1 , . . . , x t−Tpast ). This problem, very similar to the estimate of the forward dynamics model of a system, is frequently addressed by researchers in model predictive control, where being able to "play" the system evolving in time is the basis for computing appropriate robot controls. When a trajectory can be predicted by an observer from early observations of it, we can say that the trajectory is not only legible, but predictable. A systematic approach for predicting a trajectory is to reason in terms of movement primitives, in such a way that the sequence of points of the trajectory can be generated by a parametrized time model or a parametrized dynamical system. For example, [39] plan reaching trajectories for object-carrying that are able to convey information about the weight of the transported object. More generally, in generative approaches [40], latent variables are used to learn models for the primitives, both to generate and infer actions. The next subsection will provide more detail about the state-of-the-art techniques for generating movement primitives.
In [41], the robot first learns Interaction Primitives by watching two humans performing an interactive task, using motion capture. The Interaction Primitive encapsulates the dependencies between the two human movements. Then, the robot uses the Interaction Primitive to adapt its behavior to its partner's movement. Their method is based on Dynamics Motor Primitives [18], where a distribution over the DMP's parameters is learned. Notably, in this paper we didn't follow the same approach to learn Interaction Primitives, since there is a physical interaction that makes the user's and the robot's movements as one joint movements. Moreover, there is no latency between the partner's early movement and the robot's, because the robot's arm is physically driven by the human until the latter breaks the contact.
Indeed, most examples in the literature focus on kinematic trajectories, corresponding to gestures that are typically used in dyadic interactions characterized by a coordination of actions and reactions. Whenever the human and robot are also interacting physically, collaborating on a task with some exchange of forces, then the problem of intention recognition becomes more complex. Indeed, the kinematics information provided by the "trajectories" cannot be analyzed without taking into account the haptic exchange and the estimation of the "roles" of the partners in leading/following each other.
Estimating the current role of the human (master/slave or leader/follower) is crucial, as the role information is necessary to coherently adapt the robot's compliance and impedance at the level of the exchanged contact forces. Most importantly, adapting the haptic interaction can be used by the robot to communicate when it has understood the human intent and is able to finish the task autonomously, mimicking the same type of implicit nonverbal communication that is typical of humans.
For example, in [42], the robot infers the human intention utilizing the measure of the human's forces and by using Gaussian Mixture Models. In [43], the arm impedance is adapted by a Gaussian Mixture Model based on measured forces and visual information. Many studies focused on the robot's ability to act only when and how its user wants [44][45] and to not interfere with the partner's forces [46] or actions [47].
In this paper, we describe our approach to the problem of recognizing the human intention during collaboration by providing an estimate of the future intended trajectory to be performed by the robot. In our experiments, the robot does not adapt its role during the physical interaction, but simply switch from follower to leader when the human breaks contact with it.

Movement primitives
Movement Primitives (MPs) is a well established paradigm for representing complex motor skills.
The most known method for representing movement primitives is probably the Dynamic Movement Primitives (DMPs) [18,48,19]. DMPs use a stable non-linear attractor in combination with a forcing term to represent the movement. The forcing term enables to follow specific movement, while the attractor asserts asymptotic stability. In a recent paper, [19] proposed an extension to DMPs, called PDMP (Probabilistic Dynamic Movement Primitive). This method improves DMP with probabilistic properties to measure the likelihood that the movement primitive is executed correctly and to perform inference on sensor measurement. However, The PDMPs do not have a data-driven generalization and can deviate arbitrarily from the demonstrations. This last difference can be critical for our applications with the humanoid robot iCub, since uncertainties are unavoidable and disturbances may happen frequently and de-stabilize the robot movement (for example, an unexpected collision during the movement). Thus, the ProMPs method is more accurate for our software.
[49], [50] and [25] compared ProMPs and DMPs for learning primitives and specifically interaction primitives. With the DMP model, at the end of the movement, only a dynamic attractor is activated. Thus, it always reach a stable goal. The properties allowed by both methods are temporal scaling of the movement, learning from a single demonstration, and generalizing to new final position. With ProMPs, we have in addition the ability to do inference (thanks to the distribution), to force the robot to pass by several initial via-points (the early observations), to know the correlation between the input of the model, and to co-activate some ProMPs. In our study, we need these features, because the robot must determine a trajectory that passes by the early observations (beginning of the movement where the user guides physically the robot). Hidden Markov Models (HMMs) for movement skills were introduced by [52]. This method is often used to categorize movements, where a category represents a movement primitive. This method also allows to represent the temporal sequence of a movement. In [53] they use learned Hierarchical Hidden Markov Model (HHMMs) to recognize human behaviors efficiently. In [54] they present the Primitive based Coupled-HMM (CHMM) approach, for human natural complex action recognition. In this approach, each primitive is represented by a Gaussian Mixture Model.
Adapting Gaussian Mixture Models is another method used to learn physical interaction with learning. In [55] they use GMMs and Gaussian Mixture Regression to learn, in addition to the position (joint information), force information. Using this method, a humanoid robot is able to collaborate in one dimension with its partner for a lifting task. In our paper, we will also use (Cartesian) position and force information to allow our robot to interact physically with its partner.
A sub-problem of movement recognition is that robots need to estimate the duration of the trajectory to align a current trajectory with learned movements. In our case, at the beginning of the physical Human-Robot Interaction (pHRI), the robot observes a partial movement guided by its user. Given this partial movement, the robot must first estimate what the current state of the movement is to understand what its partner intent is. Thus, it needs to estimate the partial movement's speed.
Fitts' law models the movement duration for goal-directed movements. This model is based on the assumption that the movement duration is a linear function of the difficulty to achieve a target [56]. In [57], they show that by modifying the target's width, the shape of the movement changes. Thus, it is difficult to apply Fitt's law when the size of the target can change. In [57] and [58], they confirm this idea by showing that the shape of the movement changes with the accuracy required by the goal position of the movement.
Dynamics Time Warping (DTW) is a method to find the correlation between two trajectories that have different durations, in a more robust way than the Euclidean distance. In [41], they modify the DTW algorithm to match a partial movement with a reference movement. Many improvements over this method exist. In [59], they propose a robust method to improve the indexation. The calculation speed of DTW is improved using different methods, such as FastDTW, Lucky Time Warping or FTW. An explanation and comparison of these methods is presented in [60], where they add their own computation speed improvement by using a method called Pruned Warping Paths. This method allows the deletion of unlikely data. However, a drawback of this well-known DTW method is they don't preserve the global trajectory's shape.
In [25], where they use a probabilistic learning of movement primitives, they improve the duration estimation of movements by using a different time warping method. This method is based on a Gaussian basis model to represent a time warping function and, instead of DTW, it forces a local alignment between the two movements without "jumping" some index. Thus, the resulting trajectories are more realistic, smoother, and this method preserves the global trajectories' shapes.
For inferring the intention of the robot's partner, we use Probabilistic Movement Primitives (ProMPs), [20]). Specifically, we use the ProMP's conditioning operator to adapt the learned skills according to observations. The ProMPs can encode the correlations between forces and positions and allow better prediction of the partner's intention. Further, the phase of the partner's movement can be inferred and therefore the robot can adapt to the partner's velocity changes.
ProMPs are more efficient for collaborative tasks, as shown in [25], where in comparison to DMPs, the root-mean square error of the predictions is lower.

Related open-source software
One of the goals of this paper is to introduce an open-source software for the iCub (but potentially for any other robot), where the ProMP method is used to recognize human intention during collaboration, so that the robot can execute initiated actions autonomously. This is not the first open-source implementation for representing movement primitives: however, it has a novel application and a rationale that makes it easy to use with the iCub robot.
In Table 1 we report on the main software libraries that one can use to learn movement primitives. Some have been also used to realize learning applications with iCub, e.g., [61,22] or to recognize human intention. However, the software we propose here is different: it provides an implementation of ProMPs used explicitly for intention recognition and prediction of intended trajectories. It is interfaced with iCub, both real and simulated, and addresses in the specific case of physical interaction between the human and the robot. In short, it is a first step towards adding intention recognition ability to the iCub robot.

Theoretical framework
In this section we present the theoretical framework that we use to tackle the problem of intent recognition: we describe the ProMPs and how they can be used to predict trajectories from early observations.
In Section 3.2 we formulate the problem of learning a primitive for a simple case, where the robot learns the distribution from several demonstrated trajectories. In Section 3.3 we formulate and provide the solution to the problem of predicting the "future" trajectory from early observations (i.e., the initial data points). In Section 3.4 we discuss the problem of predicting the time modulation, i.e., predicting the global duration of the predicted trajectory. This problem is non-trivial, as by construction the demonstrated trajectories are "normalized" in duration when the ProMP is learned. 7 In Section 3.5 we explain how to recognize, from the early observations, to which of many known skills (modeled by ProMPs) the current trajectory belongs. In all these sections we tried to present the theoretical aspects related to the use of ProMPs for the intention recognition application.
Practical examples of these theoretical problems are presented and explained later in sections 5 -7. Section 5 explains how to use our software, introduced in Section 4, for learning one ProMP for a simple set of 1-DOF trajectories. Section 6 presents an example with the simulated iCub in Gazebo, while Section 7 presents an example with the real iCub.

Notation
To facilitate understanding of the theoretical framework, we first introduce the notations we use in this section and throughout the remainder of the paper. Trajectories: : the x/y/z-axis Cartesian coordinate of the robot's end-effector.
: the wrench contact forces, i.e. the external forces and moments measured by the robot at the contact level (end-effector).
• ξ(t) ∈ R D : the generic vector containing the current value or state of the trajectories at time t. It can be mono-dimensional (e.g. ξ(t) = [z(t)]), or multi-dimensional (e.g. Matlab or Python - [49] icubLearningTrajectories ProMP [74] Matlab and C++ iCub - Table 1: Open-source software libraries implementing Movement Primitives and their application to different known robots.
∈ R D·t f is an entire trajectory, consisting of t f samples or data points.
consisting of t f i samples or data points.

Movement Primitives:
• k ∈ [1 : K]: the k-th ProMP, among a set of K ProMPs that represent different tasks/actions.
• n k : number of recorded trajectories for each ProMP.
• ξ(t) = Φ t ω + ξ is the model of the trajectory with: -Φ t ∈ R D×D·M : radial basis functions (RBFs) used to model trajectories. It is a block diagonal matrix.
It must be noted that the upper term comes from a Gaussian 1 , where c i , h are respectively the center and variance of the i-th Gaussian. In our RBF formulation, we normalize all the Gaussians.
-ω ∈ R D·M : time-independent parameter vector weighting the RBFs, i.e., the parameters to be learned.
Time modulation: •s: number of samples used as reference to rescale all the trajectories to the same duration.
• Φ αit ∈ R D×D·M : the RBFs rescaled to match the Ξ i trajectory duration.
• α i =s t f i : temporal modulation parameter of the i-th trajectory . ω α : the parameter vector weighting the RBFs of the Ψ matrix. Inference: : early-trajectory observations, composed of n o data points.
• Σ o ξ : noise of the initiated trajectory observation.
•α: estimated time modulation parameter of a trajectory to infer.
•t f =ŝ α : estimated duration of a trajectory to infer.
: ground truth of the trajectory for the robot to infer.
: the estimated trajectory.
•k: index of the recognized ProMP from the set of K known (previously learned) ProMPs.

Learning a Probabilistic Movement Primitive (ProMP) from demonstrations
Our toolbox to learn, replay and infer the continuation of trajectories is written in Matlab and available at: https://github.com/inria-larsen/icubLearningTrajectories/tree/master/MatlabProgram Let us assume the robot has recorded a set of n 1 trajectories: where ω ∈ R M is the time-independent parameter vector weighting the RBFs, ξ ∼ N (0, β) is the trajectory noise, and Φ t is a vector of M radial basis functions evaluated at time t: Note that all the ψ functions are scattered across time.
For each Ξ i trajectory, we compute the ω i parameter vector to have ξ i (t) = Φ t ω i + ξ . This vector is computed to minimize the error between the observed ξ i (t) trajectory and its model Φ t ω i + ξ . This is done using the Least Mean Square algorithm, i.e.: To avoid the common issue of the matrix Φ t Φ t in Equation 3 not being invertible, we add a diagonal term and perform Ridge Regression: where λ = 10 −11 · 1 D·M ×D·M is a parameter that can be tuned by looking at the smallest singular value of the matrix Φ t Φ t .
Thus, we obtain a set of these parameters: {ω 1 , . . . , ω n }, upon which a distribution is computed.
Since we assume Normal distributions, we have: and The ProMP captures the distribution over the observed trajectories. To represent this movement primitive, we usually use the movement that passes by the mean of the distribution Figure 4 shows

Predicting the future movement from initial observations
Once the ProMP p(ω) ∼ N (µ ω , Σ ω ) of a certain task has been learned 8 , we can use it to predict the evolution of an initiated movement. An underlying hypothesis is that the observed movement follows to this learned distribution.
Suppose that the robot measures the first n o observations of the trajectory to predict (e.g., To do this prediction, we start from the learned prior distribution p(ω), and we find theω parameter within this distribution that generatesΞ. To find thisω parameter, we update the learned distribution p(ω) ∼ N (μ ω ,Σ ω ) using the formulae: where K is a gain computed by:

Predicting the trajectory time modulation
In the previous section, we presented the general formulation of ProMPs, which makes the implicit assumption that all the observed trajectories have the same duration and thus the same sampling. 9 That is why the duration of the trajectories generated by the RBF is fixed and equal tos. Of course, this is valid only for synthetic data and not for real data.
To be able to address real experimental conditions, we now consider the variation of the duration of the demonstrated trajectories. To this end, we introduce a time modulation parameter α that maps the actual trajectory duration t f tos: α =s/t f . The normalized durations can be chosen arbitrarily; for example it can be set to the average of the duration of the trajectories, e.g., Notably, in the literature sometimes α is called phase [20,50]. The effect of α is to change the phase of the RBFs, that are scaled in time.
The time modulation of the i-th trajectory Ξ i is computed by α i =s t f i . Thus, we have Thus, the improved ProMP model is: where Φ αt is the RBFs matrix evaluated at time αt. All the M Gaussian functions of the RBFs are spread over the same number of sampless. Thus, we have: During the learning step, we record a set of α parameters: S α = {α 1 , . . . , α n }. Then, using this set, we can replay the learned ProMP with different speeds. By default (e.g. when α = 1), the 9 Actually, we call here duration what is in fact the total number of samples for the trajectory.  : This plot shows the predicted trajectory given early observations (data points, in black), compared to the ground truth (e.g., the trajectory that the human intends to execute with the robot). We show the prior distribution (in light blue) and the posterior distribution (in red), which is computed by conditioning the distribution to match the observations. Here, the posterior simply uses the average α computed over the α 1 , . . . , α K of the K demonstrations. Without predicting the time modulation from the observations and using the average α, the predicted trajectory has a duration that is visibly different from the ground truth.
During the inference, the time modulation α of the partially observed trajectory is not known.
Unless fixed a priori, the robot must estimate it. This estimation is critical to ensure a good recognition, as shown in Figure 2: the inferred trajectory (represented by the mean of the posterior distribution in red) does not have the same duration as the "real" intended trajectory (which is the ground truth). This difference is due to the estimation error of the time modulation parameter.
This estimationα by default is computed as the mean of all the α k observed during the learning: However, using the mean value for the time modulation is an appropriate choice only when the primitive represents goal-directed motions that are very regular, or for which we can reasonably assume that differences in the duration can be neglected (which is not a general case). In many applications this estimation may be too rough.
Thus, we have to find a way to estimate the duration of the observed trajectory, which corresponds to accurately estimating the time modulation parameterα. To estimateα, we implemented four different methods. The first is the mean of all the α k , as in Equation 11. The second is the maximum likelihood, witĥ The third is the minimum distance criterion, where we seek the bestα that minimizes the difference between the observed trajectory Ξ o t and the predicted trajectory for the first n o data points:α The fourth method is based on a model: we assume that there is a correlation between α and the variation of the trajectory δ no from the beginning until the time n o . This "variation" δ no can be computed as the variation of the position, e.g., δ no = X(n o ) − X(1), or the variation in the entire trajectory, δ no = Ξ(n o ) − Ξ(1), or any other measure of progress, if this hypothesis is appropriate for the type of task trajectories of the application. 10 Indeed, the α can be linked also to the movement speed, which can be roughly approximated byẊ = δX t f (Ξ = δΞ t f ). We model the mapping between δ no and α by: where Ψ are RBFs, and α is a zero-mean Gaussian noise. During learning, we compute the ω α parameter, using the same method as in Equation 3. During the inference, we computê A comparison of the four methods for estimating α on a test study with iCub in simulation is presented in Section 6.6.
There exist other methods in the literature for computing α. For example, [49] propose a method that models local variability in the speed of execution. In [24] they use a method that improves Dynamic Time Warping by imposing a smooth function on the time alignment mapping using local optimization. These methods will be implemented in the future works.

Recognizing one among many movement primitives
Robots should not learn only one skills, but many: different skills for different tasks. In our framework, tasks are represented by movement primitives, precisely ProMP. So it is important for the robot to be able to learn K different ProMPs and then be able to recognize from the early observations of a trajectory which of the K ProMPs the observations belong to.
During the learning step of a movement primitive k ∈ After having learned these K ProMPs, the robot can use this information to autonomously execute a task trajectory. Since we are targeting collaborative movements, performed together with a partner at least at the beginning, we want the robot to be able to recognize from the first observations of a collaborative trajectory which is the current task that the partner is doing and what is the intention of the partner. Finally, we want the robot to be able to complete the task on its own, once it has recognized the task and predicted the future trajectory.
From these partial observations, the robot can recognize the "correct" (i.e., most likely) ProMP k ∈ [1 : K]. First, for each ProMP k ∈ [1 : K], it computes the most likely phase (time modulation factor)α k (as explained in Section 3.4), to obtain the set of ProMPs with the most likely duration: Then we compute the most likely ProMPk in S [µω k ,α k ] according to some criterion. One possible way is to minimize the distance between the early observations and the mean of the ProMP for the first portion of the trajectory: Once identified thek-th most likely ProMP, we update its posterior distribution to take into account the initial portion of the observed trajectory, using Equation 8:

Software overview
In this section, we introduce our open-source software with an overview of its architecture. This software is composed of two main modules, represented in Figure 3.
While the robot is learning the Probabilistic Movement Primitives (ProMPs) associated to the different tasks, the robot is controlled by its user. The user's guidance can be either manual for the real iCub, or through a haptic device for the simulated robot.
A Matlab module allows replaying movement primitives or finishing a movement that has been initiated by its user. By using this module, the robot can learn distributions over trajectories, replay movement primitives (using the mean of the distribution), recognize the ProMP that best A C++ module forwards to the robot the control that comes either from the user or from the Matlab module. Then, the robot is able to finish a movement initiated by its user (directly or through a haptic device) in an autonomous way, as shown in Figure 1.
We present the C++ module in Section 6.2 and the theoretical explanation of the Matlab module algorithms in Section 3. A guide to run this last module is first presented in Section 5 for a simple example, and in Section 6 for our application, where a simulated robot learns many measured information of the movements. Finally, we present results on the real iCub application in Section 7.
Our software is available through the GPL licence, and publicly available at: https://github.com/inria-larsen/icubLearningTrajectories. Tutorial, readme and videos can be found in that repository. First, the readme file describes how to launch simple demonstrations of the software. Videos present these demonstrations to simplify the understanding. In the next sections, we detail the operation of the demo program for a first case of 1DOF primitive, followed by the presentation of the specific applications on the iCub (first simulated and then real).

Software example: learning a 1-DOF primitive
In this section, we present the use of the software to learn ProMPs in a simple case of 1-DOF primitive. This example only uses the MatlabProgram folder, composed of: • A sub-folder called "Data", where there are trajectory sets used to learn movement primitives.
These trajectories are stored in text files with the following information: input parameters with time-step: # timeStep # input 1 # input 2 [...] -recordTrajectories.cpp program recording: See Section 6.3 for more information.
• A sub-folder called "used functions". It contains all the functions used to retrieve trajectories, compute ProMPs, infer trajectories, and plot results. Normally, using this toolbox does not require understanding these functions. The first lines of these functions give an explanation of their functioning and precise what are the input(s) and output(s) parameters.
• Matlab scripts called "demo *.m". They are simple examples of how to use this toolbox.
The script demo plot1DOF.m, can be used to compute a ProMP and to continue an initiated movement. The ProMP is computed from a dataset stored in a ".mat" file, called traj1 1DOF.mat.
In this script, variables are first defined to make the script specific to the current dataset: 1 %%%%%%%%%%%%%%%VARIABLES, please look at the README 2 %Can be either ".mat" or ".txt". In the current demo, you can also write ... DataPath = Data/traj1 if you want to use the text files of this dataset.

%%%%%%%%%%%%%% END VARIABLE CHOICE
The variables include: • DataPath is the path to the recorded data. If the data are stored in text files, this variable contains the folder name where text files are stored. These text files are called "recordX.txt", with X ∈ [0 : n − 1] if there are n trajectories. One folder is used to learn one ProMP. If the data are already loaded from a ".mat" file, write the whole path with the extension. The data in ".mat" matches with the output of the Matlab function loadTrajectory.
• nbInput= D is the dimension of the input vector ξ t .
• expNoise = Σ o ξ is the expected noise of the initiated trajectory. The smaller this variable is, the stronger the modification of the ProMP distribution will be, given new observations.
We will now explain more in detail the script. To recover data recorded in a ".txt" file, we call the function:  Once the ProMP is learned, the robot can reproduce the movement primitive using the mean of the distribution. Moreover, it can now recognize a movement that has been initiated in this distribution, and predict how to finish it. To do so, given the early n o observations of a movement, the robot updates the prior distribution to match the early observed data points: through conditioning, it finds the posterior distribution, that can be used by the robot to execute the movement on its own. The first step in predicting the evolution of the trajectory is to infer the duration of this trajectory, which is encoded by the time modulation parameterα. The computation of this inference, which was detailed in Section 3.4, can be done by using the function: It can be interesting to plot the quality of the predicted trajectories as a function of the number of observations, as done in Figure 6. prior ProMP ground truth Figure 6: The prediction of the future trajectory given early observations, exploiting the information of the learned ProMP ( Figure 4). The plots show the predicted trajectories after 10%, 30%, 50% and 80% of observed data points.
Note that when we have observed a larger portion of the trajectory, the prediction of the remaining portion is more accurate.  To measure the quality of the prediction, we can use: • The likelihood of having the Ξ * trajectory given the updated distributionp(ω).
• The distance between the Ξ * trajectory and theΞ inferred trajectory.
However, according to the type of recognition typeReco used to estimate the time modulation parameter α from the early observations, a visible mismatch between the predicted trajectory and the real one can be visible even when a lot of observations are used. This is due to the error of the expectation of this time modulation parameter. In Section 3.4, we present the different methods used to predict the trajectory duration. These methods select the most likelyα according to different criteria: distance; maximum likelihood; model of the α variable 11 ; and average of the observed α during learning. On this simple test, where the variation time is little as shown in Table 2, the best result is accomplished by the average of time modulation parameter of the trajectories used during the learning step. In more complicated cases, when the time modulation varies, the other methods will be preferable as seen in Section 3.5.
6 Application on the simulated iCub: learning three primitives In this application, the robot learns multiple ProMPs and is able to predict the future trajectory of a movement initiated by the user, assuming the movement belongs to one of the learned primitives.
Based on this prediction, it can also complete the movement once it has recognized the appropriate ProMP.
We simplify the three actions/tasks by reaching three different targets, represented by three colored balls in the reachable workspace of the iCub. The example is performed with the simulated iCub in Gazebo. Figure 8 shows the three targets, placed at different heights in front of the robot.
In Section 6.1 we formulate the intention recognition problem for the iCub: the problem is to learn the ProMP from trajectories consisting of Cartesian positions in 3D 12 and the 6D wrench information measured by the robot during the movement. In Section 6.2 we describe the simulated setup of iCub in Gazebo, then in Section 6.3 we explain how trajectories are recorded, including force information, when we use the simulated robot.

Predicting intended trajectories by using ProMPs
The model is based on Section 3, but here we want to learn more information during movements.
We record this information in a multivariate parameter vector: Were X t ∈ R 3 is the Cartesian position of the robot's end effector and F t ∈ R 6 the external forces and moments. In particular, F t contains the user's contact forces and moments. Let us call 6D/7D Cartesian position and orientation of the hand, to make the robot change also the orientation of the hand during the task. The corresponding ProMP model is: Where ω ∈ R D·M is the time independent parameter vector, t = Xt Ft ∈ R D is the zero-mean Gaussian i.i.d. observation noise, and Φ αt ∈ R D×D·M a matrix of Radial Basis Functions (RBFs) evaluated at time αt.
Since we are in the multidimensional case, this Φ αt block diagonal matrix is defined as: It is a diagonal matrix of D Radial Basis Functions (RBFs), where each RBF represents one dimension of the ξ t vector and it is composed of M Gaussians, spread over same number of samples s.
For each trajectory Ξ i[1: , it computes the optimal ω ki parameter vector that best approximates the trajectory.
We saw in Section 3.5 how these computations are done. In our software, we use matrix computation instead of t f i iterative ones done for each observation t (as in Equation 3). Thus, we have:

Prediction of the trajectory evolution from initial observations
After having learned the three ProMPs, the robot is able to finish an initiated movement on its own. In Sections 3.3, 3.4 and 3.5 we explained how to compute the future intended trajectory given the early observations.
In this example, we add specificities about the parameters to learn. Second, we use the total observation Ξ o to update the ProMP, as seen in Section 3.3. This computation is based on equation 16, but here again, we use matrix computation: From this posterior distribution, we retrieve the inferredΞ = {ξ 1 , ...,ξt f } trajectory, with: Note that the inferred wrenchesF t , here, correspond to the simulated wrenches in Gazebo. In this example there is little use for them in simulation; the interest for predicting also wrenches will be clearer in Section 7, with the example on the real robot.

Setup for simulated iCub
For this application, we created a prototype in Gazebo, where the robot must reach three different targets with the help of a human. To interact physically with the robot simulated in Gazebo, we used the Geomagic touch, a haptic device.
The setup consists of: • the iCub simulation in Gazebo, complete with the dynamic information provided by whole-BodyDynamicsTree (https://github.com/robotology/codyco-modules/tree/master/src/ modules/wholeBodyDynamicsTree) and the Cartesian information provided by iKinCarte-sianController ; • the Geomagic Touch, installed following the instructions in https://github.com/inria-larsen/ icub-manual/wiki/Installation-with-the-Geomagic-Touch, which not only install the SDK and the drivers of the GeoMagic but also point to how to create the yarp drivers for the Geomagic; • a C++ module (https://github.com/inria-larsen/icubLearningTrajectories/tree/ master/CppProgram) that connects the output command from the Geomagic to the iCub in Gazebo, and eventually enables recording the trajectories on a file. A tutorial is included in this software.
The interconnection among the different modules is represented in Figure 3, where the Matlab module is not used. The tip of the Geomagic is virtually attached to the end-effector of the robot: When the operator moves the Geomagic, the position of the Geomagic tip x geo is scaled (1:1 by default) in the iCub workspace as x icub hand , and the Cartesian controller is used to move the iCub hand around a "home" position, or default starting position: where hapticDriverMapping is the transformation applied by the haptic device driver, which essentially maps the axis from the Geomagic reference frame to the iCub reference frame. By default, no force feedback is sent back to the operator in this application, as we want to emulate the zero-torque control mode of the real iCub, where the robot is ideally transparent and not opposing any resistance to the human guidance. A default orientation of the hand ("katana" orientation) is set.

Data acquisition
The dark button of the Geomagic is used to start and stop the recording of the trajectories. The operator must click and hold the button during the whole movement and release the button at the end. The trajectory is saved on a file called recordX.txt for the X-th trajectory. The structure of this file is: In our project on Github, we provide the acquired dataset with the trajectories for the interested reader who wishes to test the code with these trajectories. Two datasets are available at https:// github.com/inria-larsen/icubLearningTrajectories/tree/master/MatlabProgram/Data/: the first dataset called "heights" is composed of three goal-directed reaching tasks, where the targets vary in height; the second dataset called "FLT" is composed of trajectories recorded on the real robot, whose arms moves forward, to the left and to the top.
A matlab script that learns ProMPs with such kinds of datasets is available in the toolbox, called demo plotProMPs.m. It contains all the following steps.
To load the first "heights" dataset with the three trajectories, write:  Figure 8 shows the three sets of demonstrated trajectories. In the used dataset called "heights", we have recorded 40 trajectories per movement primitive.

Learning the ProMPs
We need to first learn the ProMPs associated to the three observed movements. First, we partition the collected dataset into a training set and test dataset for the inference. One random trajectory for the inference is used: 1 [train{i},test{i}] = partitionTrajectory(t{i},1,percentData,s bar); The second input parameter specifies that we select only one trajectory, randomly selected, to test the ProMP. Now, we compute the three ProMPs with: We set the following parameters: • s bar=100: reference number of samples, which we note in this paper ass.  • expNoise = 0.00001: the expected data noise. We assume this noise to be very low, since this is a simulation.
• percentData = 40: this variable specifies the percentage of the trajectory that the robot will be observed, before infering the end.
These parameters can be changed at the beginning of the Matlab script.

Predicting the time modulation
In Section 3.4 we presented four main methods for estimating the time modulation parameter, discussing why this is crucial for a better estimation of the trajectory. Here, we compare the methods on the three goals experiment. We recorded 40 trajectories for each movement primitive, for a total of 120 trajectories. After having computed the corresponding ProMPs, we tested the inference by providing early observations of a trajectory that the robot must finish. For that purpose, it recognizes the correct ProMP among the three precedently learned (see Section 3.5) and then it estimates the time modulation parameterα. Figure 10 represents the average error of theα during inference for 10 trials according to the number of observations (from 30% to 90% of observed data) and according to the used method. These methods are the ones we have just presented before that we called mean (Equation 11), maximum likelihood (Equation 12), minimum distance (Equation 13) or model (Equation 14). Each time, the tested trajectory is chosen randomly from the data set of observed trajectories (of course, the test trajectory does not belong to the training set, so it was not used in the learning step). The method that takes the average of α observed during learning is taken as comparison (in black). We can see that other methods are more accurate. The maximum likelihood is increasingly more accurate, as expected. The fourth method (model ) that models the α according to the global variation of the trajectory's positions during the early observations is the best performing when the portion of observed trajectory is small (e.g., 30%-50%). Since it is our interest to predict the future trajectory as early as possible, we adopted the model method for our experiments.

Application on the real iCub
In this section we present and discuss two experiments with the real robot iCub.
In the first, we take inspiration from the experiment of the previous Section 6, where the "tasks" are exemplified by simple point-to-point trajectories demonstrated by a human tutor. In this experiment we explore how to use wrench information and use known demonstrations as ground truth, to evaluate the quality of our prediction.
In the second experiment, we set up a more realistic collaborative scenario, inspired by collaborative object sorting. In such applications, the robot is used to lift an object (heavy, or dangerous, or that the human cannot manipulate, as for some chemicals or food), the human inspects the object and then decides if it is accepted or rejected. Depending on this decision, the object goes on a tray or bin in front of the robot, or on a bin located on the robot side. Dropping the objects in two cases must be done in a different way. Realizing this application with iCub is not easy, as iCub cannot lift heavy objects and has a limited workspace. Therefore, we simplify the experiment with small objects and two bins. The human simply starts the robots movement with physical guidance, and then the robot finishes the movement on its own. In this experiment the predicted trajectories are validated on-the-fly by the human operator.
In a more complex collaborative scenario, tasks could be elementary tasks such as pointing, grasping, reaching, manipulating tools (the type of task here is not important, as long as it can be represented by a trajectory).

Three simple actions with wrench information
Task trajectories, in this example, have both position and wrench information. In general, it is a good idea to represent collaborative motion primitives in terms of both position and wrenches, as this representation enables using them in the context of physical interaction. Contrarily to the simulated experiment, here the inferred wrenchesF t correspond to the wrenches the robot should perceive if the partner was manually guiding the robot to perform the entire movement: indeed, these wrenches are computed from the demonstrations used to learn the primitive. The predicted wrenches can be used in different ways, depending on the application. For example, if the partner breaks the contact with the robot, the perceived wrenches will be different. If the robot is not equipped with tactile or contact sensors, this information can be used by the robot to "perceive" the contact breaking and interpret it, for example, as the sign that the human wants the robot to continue the task on its own. Another use for the demonstrated wrenches is for detecting abnormal forces while the robot is moving: this use can have different applications, from adapting the motion to new environment to automatically detecting new demonstrations. Here, they are simply used to detect when the partner breaks the contact with the robot, and the latter must continue the movement on its own.
In the following, we present how to realize the experiment for predicting the user intention with the real iCub, using our software. The robot must learn three task trajectories represented in Figure 11. In red, the first trajectory goes from an initial position in front of the robot to its left (task A). In green, the second trajectory goes from the same initial position to the top (task C). In blue, the last trajectory goes from the top position to the position on the left (task B).
To provide the demonstrations for the tasks, the human tutor used three visual targets shown on the iCub GUI, a basic module of the iCub code that provides a real-time synthetic and augmented view of the robot status, with arrows for the external forces and colored objects for the targets.
One difficulty for novice users of iCub is to be able to drive the robot's arm making it perform desired complex 3D trajectories [76], but after some practice in moving the robot's arm the operator recorded all the demonstrations. We want to highlight that having variations in the starting or ending points of the trajectories is not at all a problem, since the ProMPs are able to deal with this variability.
We will see that by using the ProMPs method and by learning the end-effector Cartesian position, the robot will be able to learn distributions over trajectories, recognize when a movement belongs to one of these distributions and infer the end of the movement. These three objects are saved in 'Data/realIcub.mat'. A Matlab script called demo plotProMPsIcub.m recovers these data, using the function load('Data/realIcub.mat'). This script follows the same organization as the ones we previously explained in Sections 5 and 6. By launching this script, the recovered data are plotted first.
Then, the ProMPs are computed and plotted, as presented in Figure 12. In this figure, the distributions are visibly overlaid: • during the whole trajectories duration for the wrench information; • during the 40% first samples of the trajectories for the Cartesian position information.
After this learning step, the user chooses which ProMP to test. Using a variable that represents the percentage of observed data to be used for the inference, the script computes the number of early observations n o 16 that will be measured by the robot. Using this number, the robot models the time modulation parameter α 17 of each ProMP, as explained in Section 3.4. Using this model, the time modulation of the test trajectory is estimated and the corresponding ProMP is identified.  Figure 11.
Then, the inference of the trajectory's target is performed. Figure 13 represents the inference of the three tested trajectories when wrench information is not used by the robot to infer the trajectory. To realize this figure, with the comparison between the predicted trajectory and the ground truth, we applied our algorithm offline. In fact, it is not possible at time t to have the ground truth of the trajectory intended by the human from t + 1 to t f : even if we would tell to the human in advance the goal that he/she must reach for, the trajectory to reach that goal could vary. So, for the purpose of these figures and comparisons with the ground truth, we show here the offline evaluation: we select one demonstrated task trajectory from the test set (not the training 16 no is not the same for each trajectory test, because it depends on the total duration of the trajectory to be inferred. 17 Since the model uses the no parameter, its computation cannot be performed before this step.  Figure 13: The prediction of the future trajectory from the learned ProMPs computed from the position information for the 3-targets dataset on the real iCub ( Figure 12) after 40% of observations. set used to learn the ProMP) as ground truth, and imagine that this is the intended trajectory.
In Figure 13, the ground truth is shown in black, whereas the portion of this trajectory that is fed to the inference, and that corresponds to the "early observations", is represented with bigger black circles. We can see that the inference of the Cartesian position is correct, although we can see an error of about 1 second of the estimated duration time for the last trial. Also, the wrench inference is not accurate. We can assume that it is: because the robot infers the trajectory using only position information without wrench information, or because the wrenches' variation is not correlated to the position variation. To improve this result, we can make the inference using wrench in addition to Cartesian position information, as shown in Figure 14. We can see in this Figure that the estimation of the trajectory's duration is accurate. The disavantage is that the inference of the Cartesian position is less accurate because the posterior distribution computation makes a trade-off between fitting Cartesian position and wrench early observations. Moreover, to allow a correct inference using wrench information, the noise expectation must be increased to consider forces. 18 To confirm these results, we analyzed the trajectory inference and α estimation considering different percentages of each trajectory as observed data (30 to 90%). For each percentage, we performed 20 tests, with and without force information.
In Figure 15, each box-plot represents errors for 20 tests. On the top, the error criterion is the average distance between the inferred trajectory and the real one. We can see that the inference of Cartesian end-effector trajectory is more accurate without wrench information. On the bottom, the error criterion is the distance between the estimated α and the real one. We can see that using wrench information, the estimation of the α is more accurate. Thus, these two graphs confirm what we assumed from Figures 13 and 14.
Median, mean and variance of the prediction errors, computed with the normalized root-meansquare error (NRMSE) are reported in Table 3. The prediction error for the time modulation is a scalar: |α prediction − α real |. The prediction error for the trajectory is computed by the NRMSE of In future upgrades for this application, we will probably use the wrench information only to estimate the time modulation parameter α, to have both the best inference of the intended trajectory and the best estimation of the time modulation parameter to combine the benefits of inference with and without wrench information. Table 3 also reports the average time for computing the prediction of both time modulation and posterior distribution. The computation were performed in Matlab, on a single core laptop (no parallelization). While the computation time for the case "without wrenches" is fine for real-time application, using the wrench information delays the prediction and represents a limit for real-time applications if fast decisions have to taken by the robot. Computation time will be improved in the future works, with the implementation of the prediction in an iterative way.

Collaborative object sorting
We realized another experiment with iCub, where the robot has to sort some objects in different bins (see Figure 16). We have two main primitives: one for a bin located on the left of the robot, and one for the bin to the front. Dropping the object is done at different heights, with a different gesture that also has a different orientation of the hand. For this reason, the ProMP model consists 18 In future versions, we will include the possibility to have different noise models for the observations, e.g. we will have Σ o Ξ = Σ X 0 0 Σ F . We will therefore set a bigger covariance for the wrench information than for the position information.
With wrenches % of observed data ( n. of samples) 30 ( 180) 50( 300) 70( 419) Table 3: Mean and stdev of the NRMSE of the prediction errors plotted in Figure 15, and average time for computing both predictions (time modulation and trajectory via update of the posterior distribution). The computation were performed in Matlab, on a single core (no parallelization).
of the Cartesian position of the hand X t = [x t , y t , z t ] ∈ R 3 and its orientation A t ∈ R 4 , expressed as a quaternion: As in the previous experiment, we first teach the robot the primitives by kinesthetic teaching, with a dozen of demonstrations. Then we start the robot movement: the human operator physically grabs the robot's arm and start the movement towards one of the bins. The robot' skin is used twice. First, to detect the contact when the human grabs the arm, which marks the beginning of the observations. Second, when the human breaks the contact with the arm, which marks the end of the observations. Using the first portion of the observed movement, the robot recognize the current task that is being executed, predicts the future movement that is intended by the human and then executes it on its own. In the video (see link in Section 8) we artificially introduced a pause to let the operator "validate" the predicted trajectory, using a visual feedback on the iCubGui. Figure 17 shows one of the predictions made by the robot after the human releases the arm. Of course in this case we do not have a "ground truth" for the predicted trajectory, only a validation of the predicted trajectory by the operator.

Videos
We recorded several videos that complement the tutorials. The videos are presented in the github repository of our software: https://github.com/inria-larsen/icubLearningTrajectories/ tree/master/Videos.

DISCUSSION
While we believe that our proposed method is principled and has several advantages for predicting intention in human-robot interaction, there are numerous improvements that can be done. Some will be object of our future works. Improving the estimation of the time modulation -Our experiments showed that estimating the time modulation parameter α, determining the duration of the trajectory, greatly improves the prediction of the trajectory in terms of difference with the human intended trajectory (i.e., our ground truth). We proposed four simple methods in Section 3.4, and in the iCub experiment we showed that the method that maps the time modulation and the variation of the trajectory in the first n o observations provides a good estimate of the time modulation α for our specific application. However, it is an ad hoc model that cannot be generalized to all possible cases.
Overall, the estimation of the time modulation (or phase) can be improved. For example, [24] used Dynamic Time Warping, while [49] proposed to improve the estimation by having local estimations of the speed in the execution of the trajectory, to comply with cases where the velocity of task trajectory may not be constant throughout the task execution. In the future, we plan to explore more solutions and integrate them into our software. Improving prediction -Another point that needs further investigation and improvement is how to improve the prediction of the trajectories exploiting different information. In our experiment with iCub, we improved the estimation of the time modulation using position and wrench information; however, we observed that the noisy wrench information does not help in improving the prediction of the position trajectory. One improvement is to certainly exploit more information from the demonstrated trajectories, such as estimating the different noise of every trajectory component and exploiting this information to improve the prediction. Another possible improvement would consist in using contextual information about the task trajectories.
Finally, it would be interesting to try to identify automatically the characteristic such as velocity profiles or accelerations, that are renown to play a key role in attributing intentions to human movements. For example, in goal-directed tasks such as reaching, the arm velocity profile and the hand configuration are cues that helps us detect intentions. Extracting these cues automatically, leveraging the estimation of the time modulation, would probably improve the prediction of the future trajectory. This is a research topic on its own, outside the scope of this paper, with strong links to human motor control. Continuous prediction -In Section 3.5 we described how to compute the prediction of the future trajectory after recognizing the current task. However, we did not explore what happens if the task recognition is wrong: this may happen, if there are two or more task with a similar trajectory at the beginning (e.g., moving the object from the same initial point towards one of four possible targets), or simply because there were not enough observed points. So what happens if our task recognition is wrong? How to re-decide on a previously identified task? And how should the robot decide if its current prediction is finally correct (in statistical terms)? While implementing a continuous recognition and prediction is easy with our framework (one has simply to do the estimation at each time step), providing a generic answer to these question may not be straightforward. Re-deciding about the current task implies also changing the prediction of the future trajectory. If the decision does not come with a confidence level greater than a desired value, then the robot could face a stall: if asked to continue the movement but unsure about the future trajectory, should it continue or stop? The choice may be application-dependent. We will address these issues and the continuous prediction in future works. Improving computational time -Finally, we plan to improve the computational time for the inference and the portability of our software by porting the entire framework in C++.
Learning tasks with objects -In many collaborative scenarios, such as object carrying and cooperative assembly, the physical interaction between the human and the robot is mediated by objects. In these cases, if specific manipulations must be done on the objects, our method still applies, but not only on the robot. It must be adapted to the new "augmented system" consisting of robot and object. Typically, we could image a trajectory for some frame or variable or point of interest for the object, and learn the corresponding task. Since ProMPs support multiplication and sequencing of primitives, we could exploit the properties of the ProMPs to learn the joint distribution of the robot task trajectories and the object task trajectories.

CONCLUSION
In this paper we propose a method for predicting the intention of a user physically interacting with the iCub in a collaborative task. We formalize the intention prediction as predicting the target and "future" intended trajectory from early observations of the task trajectory, modeled by Probabilistic Movement Primitives (ProMPs). We use ProMPs because they capture the variability of the task, in the form of a distribution of trajectories coming from several demonstrations of the task. From the information provided by the ProMP, we are able to compute the future trajectory by conditioning the ProMP to match the early observed data points. Additional features of our method are the estimation of the duration of the intended movement, the recognition of the current task among the many known in advance, and multimodal prediction. set of motion primitives corresponding to different tasks, from several demonstrations provided by a user. The resulting ProMPs are the prior information that is later used to make inferences about human intention. When the human starts a new collaborative task, the robot uses the early observations to infer which task the human is executing, and predicts the trajectory that the human intends to execute. When the human releases the robot, the predicted trajectory is used by the robot to continue executing the task on its own.
In Section 9 we discussed some current issues and challenges for improving the proposed method and make it applicable to a wider repertoire of collaborative human-robot scenarios. In our future works, our priority would be in accelerating the time for computing the inference, and finding a principled way to do continuous estimation, by letting the robot re-decide continuously about the current task and future trajectory.

Appendices A Detail of the inference formula
In this appendices, we explain how to obtain the inference formulae used in our software. First, let us recall the Marginal and Conditional Gaussians laws 19 Given a marginal Gaussian distribution for x and a Gaussian distribution for y given x in the form: the marginal distribution of y and the conditional distribution of x given y are given by p(x|y) = N x|ΣA L(y − b) + ∆µ, Σ where Σ = (∆ + A LA) −1 We computed the parameter's marginal Gaussian distribution from the set of observed movements: From the model Ξ t = Φ [1:t f ] ω + Ξ , we have the conditional Gaussian distribution for Ξ given ω: Then, using Equation 19: that is the prior distribution of the ProMP.
Let   F L Figure 16: The second experiment with the robot: iCub must sort the objects into two bins, guided by the human. If the object is good, the robot has to put the object in the "front bin"; if the object is not good, the robot has to put the object in the "left bin". The gestures to put the objects into the two bins are different. To simplify, the drop locations for the two bins are represented by the targets F and L. After inspecting the object, the human drives the robot towards the front of the left.
Action towards the front bin (F)   (Figure 16). The black circles represent the observations acquired while the human is physically moving the iCub's arm. When the human breaks the contact and releases the arm, the robot predicts the future trajectory and continues the movement. The prior of the recognized ProMP is blue, the posterior ProMP used for prediction is red, the prior ProMP of the other task (i.e., the one that is recognized as not the one currently being executed) is green.Top, F: the human moves the arm towards the front bin. After few observations (∼ 0.5s) the robot recognizes that the movement corresponds to the "F" action. The prior of the F actions is blue, the posterior/prediction is red, the L action is green. Bottom, L: the human moves the arm towards the left bin. After few observations (∼ 0.25s) the robot has recognized the L action. The prior of the L action is blue, the posterior red, the F action (not recognized) is green.