Abstract
Real-time estimation of human action progress is critical for seamless human-robot collaboration yet remains underexplored. With this paper we propose the first real-time application of Open-end Soft-DTW (OS-DTWEU) and introduce OS-DTWWP, a novel DTW variant that integrates a Windowed-Pearson distance to effectively capture local correlations. This method is embedded in our Proactive Assistance through action-Completion Estimation (PACE) framework, which leverages reinforcement learning to synchronize robotic assistance with human actions by estimating action completion percentages. Experiments on a chair assembly task demonstrate OS-DTWWP’s superiority in capturing local motion patterns and OS-DTWEU’s efficacy in tasks presenting consistent absolute positions. Moreover we validate the PACE framework through user studies involving 12 participants, showing significant improvements in interaction fluency, reduced waiting times, and positive user feedback compared to traditional methods.
1 Introduction
In dynamic Human-Robot Collaboration (HRC), the ability to perceive and predict human actions in real time is foundational to achieving seamless coordination. Whether ensuring safety in shared workspaces, minimizing idle times in assembly tasks, or adapting to operator preferences, robots must continuously monitor human progress to act as responsive partners rather than rigid tools. Existing approaches often rely on predefined task sequences or assume idealized human behavior, limiting their applicability in real-world scenarios where operators exhibit variability in motion speed, style, and decision-making. Without robust progress estimation, robots risk desynchronization—delaying assistance, causing interruptions, or even compromising safety.
This paper addresses a core challenge in HRC: real-time estimation of human action progress and prediction of action completion time; enabling robots to synchronize their motions with human workflows at the level of individual actions. While existing research often focuses on high-level task planning or post-hoc activity recognition, the ability to track the progression in real time of atomic human actions (e.g., picking up a screwdriver, inserting a component), which is critical for coordination, remains underexplored. Consider collaborative assembly: if a robot misjudges the completion of a human operator’s action, such as tightening a screw, it may prematurely retrieve the next part (disrupting focus) or delay assistance (introducing idle time). These errors, though seemingly minor, compound across workflows, eroding efficiency and trust.
1.1 Contributions
To address the previous challenges, our work advances the state of the art across three interrelated dimensions. First, we introduce novel online Dynamic Time Warping (DTW) variants for real-time human progress estimation. Second, we demonstrate that these techniques enable the precise prediction of remaining action durations. Finally, we integrate these methods into a collaborative assembly framework designed to minimize idle times and ensure seamless synchronization between robot and human operator.
Our methodological contributions include:
We introduce OS-DTWWP, a novel open-ended DTW approach for real-time human progress estimation that incorporates a Windowed-Pearson (WP) distance. We formalize the WP distance as a shape descriptor within the shapeDTW framework (Zhao and Itti, 2018), analyze its computational complexity, and present an optimized implementation for real-time operation.
We propose two distinct methods for predicting action completion times, along with a hybrid approach that synergistically combines their strengths while mitigating individual limitations.
We present the Proactive Assistance through action-Completion Estimation (PACE) framework—a Reinforcement Learning-based system that leverages continuous human progress monitoring to synchronize proactive robot assistance with human operators, explicitly reducing waiting times through predictive scheduling.
We validate our approach through real-world experiments involving a chair assembly task with human participants, by tracking their hand motions. Yielding the following experimental contributions:
We provide empirical evidence that classical Open-end DTW is inadequate for handling human motion variability, whereas our Open-end Soft-DTW implementation—which we denote as OS-DTWEU given its reliance on the Euclidean distance—demonstrates robust performance. To our knowledge, this represents the first real-time application of Open-end Soft-DTW.
We quantitatively show that OS-DTWWP overcomes the failure cases observed with OS-DTWEU while maintaining relatively strong performance across diverse motion patterns. Our analysis further indicates that similar limitations are present in (offline) Soft-DTW, which can be effectively mitigated by incorporating the Windowed-Pearson distance.
We demonstrate the effectiveness of our completion-time estimation methods, which outperform previous approaches based on Open-end DTW.
We validate the PACE framework through real-user experiments, highlighting the efficacy of OS-DTWWP in improving collaborative efficiency as evidenced by both quantitative metrics and subjective evaluations.
This work builds on a prior conference publication (De Lazzari et al., 2025). This work significantly extends our prior publication through key methodological and experimental enhancements. Methodologically, we formally establish the Windowed-Pearson (WP) distance as a shape descriptor within the shapeDTW framework, bridging theoretical foundations with practical applications. We further analyze OS-DTWWP’s computational complexity and present its optimized implementation for real-time deployment, critical considerations omitted previously. The temporal forecasting methodology, encompassing nominal, linear, and hybrid approaches, is entirely novel, as our prior work focused solely on progress estimation without duration prediction. Additionally, we detail the initialization procedure for the simulated environment used to train the PACE policy. Experimentally, we present new analyses comparing online DTW variants to expose their limitations, along with comprehensive evaluations of completion time estimation methods. We also include PACE training results and simulate additional methods using newly collected collaborative assembly demonstrations. Beyond these extensions, this work provides in-depth technical discussions, including refined literature comparisons, theoretical justifications for design choices, expanded failure case analyses, and detailed evaluations of time estimation effectiveness across diverse experimental conditions.
1.2 Related works
1.2.1 HRC frameworks
Human-robot collaboration (HRC) demands systems capable of dynamically adapting to human actions while maintaining safety and efficiency. Early approaches focused on optimizing task sequencing (Chen et al., 2013; Rahman et al., 2015) by pre-assigning roles to humans or robots, resulting in rigid workflows. While effective in controlled settings, such methods struggle to accommodate real-world variability in human motion and decision-making. Subsequent work adopted leader-follower paradigms, where robots reactively adjust actions based on predefined human workflows (Cheng et al., 2020; Cheng et al., 2021; Ramachandruni et al., 2023; Giacomuzzo et al., 2024). However, empirical studies reveal that human operators prefer retaining task control while also expecting robots to anticipate their needs proactively (Lasota and Shah, 2015). This necessitates real-time monitoring of human actions to enable predictive assistance—a capability absent in existing task-allocation frameworks. Critically, none of these methods actively monitor human actions during execution, limiting their ability to recognize and synchronize with ongoing activities.
1.2.2 Real-time human progress estimation
Beyond HRC, a rich body of research has investigated human motion analysis, particularly focusing on action recognition (Mao et al., 2023; Yan et al., 2018; Ray et al., 2025) and motion prediction (Mao et al., 2020; Dang et al., 2021; Chen et al., 2023). These methods rely on estimated human joint positions, obtained from vision or inertial sensors, to classify actions or predict motion trajectories. While highly effective on activity recognition benchmarks and gesture-level prediction tasks, the majority of these approaches are designed for offline analysis rather than real-time deployment. A few exceptions demonstrate online operation (Chi et al., 2025; An et al., 2023), but even these focus primarily on recognizing discrete actions in streaming settings. Consequently, they do not provide continuous estimates of human task progression during execution, which is essential for action completion estimation in collaborative scenarios.
Real-time human progress monitoring remains an underexplored topic. To our knowledge, only two approaches that estimate the human progress at the action level have been proposed: Maderna et al. (2019), Maderna et al. (2020) employ Open-end Dynamic Time Warping (OE-DTW) (Sakoe, 1979) to estimate human progression, while Cheng and Tomizuka (2021) propose a Sigma log-normal model for predicting action completion times. The latter reports superior performance over DTW in their evaluations, however, our analysis reveals that OE-DTW, while providing temporal flexibility through nonlinear alignment, suffers from oversensitivity to trajectory shape variations common in real-world human motions. A follow-up work by the same authors (Leu et al., 2022) applies the same method for task planning in a collaborative assembly application.
1.2.3 Dynamic time warping
Dynamic Time Warping (DTW) (Sakoe and Chiba, 1978) is a well-known method for computing similarity between temporally misaligned sequences. Open-end DTW (Sakoe, 1979) relaxes endpoint constraints, enabling partial matches for causal systems. While Maderna et al. (2019) applied OE-DTW for real-time progress estimation, our experiments show its unsuitability to the variability of human actions with extensive real-user studies. We address this limitation through an open-ended variant of Soft-DTW (Cuturi and Blondel, 2017), which replaces DTW’s hard min operator with a differentiable softmin to mitigate local minima. Though open-ended Soft-DTW has shown promise for offline skeleton-based recognition (Manousaki and Argyros, 2023), its application in real-time human progress monitoring remains unexplored.
A deeper limitation persists: standard DTW variants use Euclidean distance, which prioritizes absolute spatial alignment over shape similarity. This proves problematic when human motions preserve geometric structure but vary in speed or amplitude.
Recent works focus on on task-adaptive time warping (Trigeorgis et al., 2016; Matsuo et al., 2023), particularly for aligning machine learning datasets. Trigeorgis et al. (2016) learns complex non-linear representations of multiple time-series based on canonical correlation analysis, while Matsuo et al. (2023) learns a distance metric by training an attention model. However, these methods require large training datasets and full knowledge of the signals, making them incompatible with open-ended scenarios where future data are unknown.
Correlation Optimized Warping (COW) (Nielsen et al., 1998) offers an alternative by maximizing Pearson correlation between signal segments, to match similar segments in fields like chromatography, proteomics, and seismology. However, COW’s rigid windowing sacrifices DTW’s temporal elasticity (Tomasi et al., 2004). Recently, seismological research (Wang et al., 2023) has combined DTW with a windowed correlation-based distance, implementing an offline, one-dimensional approach using a weighted biased cross-correlation distance with windows centered on each sample. While this marks an initial attempt to integrate correlation analysis with DTW, their method fundamentally differs from our requirements for real-time human monitoring.
Our method addresses these gaps by adapting Soft-DTW for real-time open-ended alignment (OS-DTWEU), overcoming the practical limitations of OE-DTW. Additionally, we introduce the Windowed-Pearson (WP) distance, which computes local Pearson correlations within sliding windows along the trajectory. Unlike COW’s segment-wise approach, WP integrates shape similarity into the DTW framework, enabling both local and global optimal alignment. This combination (OS-DTWWP) ensures invariance to absolute position shifts and effectively captures local patterns while preserving temporal flexibility–an essential feature for HRC applications, where humans may perform similar motions with varying locations, speeds, and intensities.
1.3 Paper outline
The remainder of the paper is organized as follows. In Section 2, we describe our proposed methods. We begin by introducing preliminaries on existing Dynamic Time Warping algorithms in Section 2.1, with a focus on the Open-end Soft-DTW algorithm. Next, in Section 2.2, we present the OS-DTWWP algorithm for real-time phase estimation, including its implementation and parameter tuning. In Section 2.3, we propose and analyze three distinct methods for action completion time prediction using online DTW. Following this, in Section 2.4, we describe the PACE framework, formulating the problem as a Partially Observable Markov Decision Process and detailing the derivation of a simulated environment for policy training. In Section 3, we outline the experimental procedure. In Section 4, we present the results on phase estimation, action completion time estimation, and collaborative assembly. In Section 5 we discuss our findings and suggest directions for future work. Finally, Supplementary Material provides further details on the Dynamic Time Warping algorithms employed in the experiments.
2 Methods
2.1 Preliminaries
2.1.1 Notation
For a vector, sequence, or signal , we denote by the element at index (with indices starting from 0). For a matrix , refers to the element in row and column with 0-based indexing. Given a vector , the subvector from index to (inclusive) is denoted by . Submatrices are defined analogously.
2.1.2 Dynamic time warping
Dynamic Time Warping (DTW) (Sakoe and Chiba, 1978) is an algorithm designed to compute the optimal alignment between two time-dependent sequences that may vary in speed, enabling a flexible, nonlinear temporal mapping. Typically, DTW enforces that the warping path starts at the first index and ends at the last index of both sequences. Moreover, each index in one sequence is matched with one or more indices in the other, subject to continuity and monotonicity constraints that preserve the original temporal order. The algorithm outputs both the temporal alignment (commonly referred to as the warping path), and the alignment cost, also known as the DTW cost.
DTW is inherently an asymmetric algorithm, designating one sequence as the reference, , and the other as the query or test sequence, . In this paper, we use the terms sequences, signals, and trajectories interchangeably. Additionally, the DTW algorithms we treat in this paper are generalized to handle multidimensional signals and designed to output the phase of the query trajectory with respect to the reference trajectory. The phase of a signal, is defined as the normalized progress along a reference trajectory. Each point in the query sequence is matched with a point in the reference. The phase of at i is then defined as:Moreover, while classical DTW uses a pointwise Euclidean (or squared Euclidean) distance, the reported algorithms generalize to an arbitrary distance function .
Dynamic Time Warping involves three key steps:
Distance matrix computation: A matrix is computed to have the distances between all pairs of points of the reference and query sequences.
Forward recursion: Dynamic programming is used to compute the minimum cumulative cost to reach each point in the matrix through a path. This is obtained by computing a matrix of the cumulative cost .
Backward recursion: The optimal warping path is traced from the last point to the start.
The detailed algorithm is reported in the Supplementary Material (see Supplementary Algorithm 1), while a visual representation of the DTW alignment is shown in Figure 1.
FIGURE 1

Illustrative example of an alignment between two similar signals obtained with Dynamic Time Warping.
2.1.3 Open-end DTW
Classical DTW requires the reference and entire query sequence to compute the optimal alignment. This requirement makes DTW unsuitable for real-time applications when sequences are streamed, as it would need access to future data points to compute the alignment.
Open-end DTW, first employed by Sakoe (1979), is a variant that relaxes the constraint that the last point of the two sequences should match, and computes the alignment which best matches all of the query with a first section of the reference. To do so, after the computation of the cumulative cost matrix , the index of the reference sequence point, , matching with the last point of the query, is chosen as the one with the minimal cost, namely,
Classical DTW, which we will refer to as offline DTW to distinguish it from Open-end DTW, matches the entire sequences, enabling the direct derivation of the phase from the full alignment, with the ability to reference both past and future data from the sequences. In contrast, Open-end DTW can operate on a partially available query trajectory. Although a recursive process could be employed to estimate the alignment of the query trajectory to the truncated reference, this paper focuses on real-time phase estimation, which makes the recursive step unnecessary. Specifically, the phase at time step is estimated causally using only past and current query data, without requiring future information. The phase estimate at step is given by:
2.1.4 Open-end soft-DTW
DTW is effective at aligning signals that vary in speed; however, by penalizing both minor and major time shifts equally, it can sometimes produce unrealistic warping paths. This drawback is even more pronounced in Open-end DTW, where no constraint exists to ensure that the final samples of the two signals match.
For offline DTW, this problem has been addressed using path constraints, such as the Sakoe-Chiba Band (Sakoe and Chiba, 1978) and the Itakura Parallelogram (Itakura, 1975), or weighting schemes like Weighted DTW (Jeong et al., 2011), which penalize warping paths deviating from the diagonal. However, these methods are not directly applicable in an online scenario, as the diagonal is unknown.
Soft-DTW (Cuturi and Blondel, 2017) replaces the minimum operation in the forward recursion of the DTW algorithm with a soft-minimum, making the DTW loss differentiable. Specifically, the operator in the forward recursion of DTW, see Supplementary Algorithm 1 Step 5, is replaced by:to ensure numerical stability when , the soft-minimum is calculated using the log-sum-exp trick:where .
Originally designed for time series averaging and clustering, Soft-DTW introduces a smoothing factor that helps mitigate local minima. In particular, the soft-minimum weighs all possible paths, ensuring that slight distortions do not dominate the final alignment. With , the formulation reduces to the standard minimum operation, whereas results in a cumulative cost equal to the sum of all costs. This formulation allows Soft-DTW to handle temporal variability more effectively. In fact, Janati et al. (2020) show that the Soft-DTW loss is not invariant to time shifts and grows quadratically with respect to the time shift, making it suitable for open-end signal matching.
For completeness, we report a version of the Open-end Soft-DTW algorithm for causal phase estimation in Algorithm 1.
Algorithm 1
-
Inputs:
-
- Query signal
-
- Reference signal
-
- Distance
-
- Smoothing parameter
-
Output:
-
- Estimated phase of w.r.t.
-
1: Initialize , where ⊳ Distance matrix computation
-
2: Initialize , with , for , ⊳ Forward recursion and for
-
3: for to do
-
4: for to do
-
5:
-
6: end for
-
7:
-
8:
-
9: end for
2.2 Online DTW for real-time phase estimation
2.2.1 Windowed-Pearson distance as a DTW metric
While the Euclidean distance remains the default metric to measure sample-wise similarity in Dynamic Time Warping, this metric assumes consistent absolute scaling between signals. Though Opend-end Soft-DTW is effective for signals with consistent absolute magnitudes and baseline positions, its reliance on the Euclidean distance makes it sensitive to vertical offsets and amplitude variations, often producing suboptimal warping paths for signals that share geometric structure but differ in execution scale.
Recent approaches like shapeDTW (Zhao and Itti, 2018) address this limitation by converting raw signals into shape descriptors (e.g., piecewise aggregate approximations or discrete wavelet coefficients) prior to alignment.
Inspired by correlation analysis, our approach adapts this shape-sensitive philosophy by introducing a windowed Pearson distance that locally normalizes amplitude differences during alignment. This creates an online-capable method that directly compares trajectory shapes through local correlation analysis, while maintaining DTW’s temporal elasticity. The combination of windowed normalization with open-end alignment proves particularly effective for human-robot collaboration scenarios, where human motion patterns exhibit consistent geometric features but significant trial-to-trial variability in speed and scale.
We formally define the Windowed-Pearson (WP) distance between two signal samples as:to calculate the distance when or , we pad the signals with their initial values.
Note that in the one-dimensional case, this distance reduces to the Pearson distance between two segments and . Thus it can be rewritten as swhere denotes the Pearson correlation coefficient between two segments.
By design, the WP distance is invariant to vertical shifts and can effectively capture local correlations. A small window measures similarity between samples based on fine-grained local patterns, while a larger window captures broader, more extended patterns.
Furthermore, we demonstrate that for one-dimensional signals, this distance is equivalent to employing z-normalization as the mapping function to calculate the shape descriptor on a window , as defined by Zhao and Itti (2018).
Consider two windowed segments and of length . Applying z-normalization, we obtain:where , , and , , denote the means and standard deviations of segments and , respectively. For z-normalized signals, it holds that:andtherefore the squared Euclidean distance between the two normalized segments becomes:where the final equality follows from substituting the two previous equalities.
Thus, for z-normalized segments, minimizing the squared Euclidean distance is equivalent to minimizing the Pearson distance (up to the multiplicative constant ). This makes the WP distance defined in Equation 1 a suitable mapping function for shapeDTW.
2.2.2 Parameter tuning
OS-DTWEU and OS-DTWWP require the tuning of one and two parameters, respectively. Specifically, the smoothing factor and, for OS-DTWWP, also the window size .
Assuming access to a training dataset, these parameters can be optimized to minimize the average mean squared error (MSE) between the estimated phase and a ground truth phase. An effective choice for the ground truth is a linear phase evolution, defined as:
Alternatively, the ground truth phase can be computed using offline DTW methods such as Soft-DTW.
In scenarios where the ultimate goal is to minimize a cost function that depends on the phase, the cost function itself can serve as the optimization objective.
While various optimization methods are applicable, we select Bayesian Optimization (Snoek et al., 2012) as our preferred method, as it requires a small number of evaluations of the cost function.
2.2.3 Real-time implementation
Open-end Soft-DTW, as described in Algorithm 1, is not directly suitable for real-time applications. To address this limitation, modifications are necessary to handle streaming input signals efficiently and to store information in a manner that ensures constant computational complexity at each step, thereby enabling bounded-time computation. Such an adapted algorithm is presented in Algorithm 2.
When a new sample arrives, the algorithm first computes , the -th row of the distance matrix (as defined in Algorithm 1), which represents the distances between the new sample and all reference samples. Subsequently, is computed, corresponding to the -th row of the accumulated cost matrix (as defined in Algorithm 1) to update the warping costs.
Algorithm 2
-
Inputs:
-
- Streaming signal sampled at each time step from the query trajectory
-
- Reference trajectory
-
- Distance
-
- Smoothing parameter
-
Output:
-
- Continuous output at each time step
-
1: Initialize
-
2: Initialize , with , for
-
3: Initialize , with
-
4: while there is a new sample do
-
5: for to do ⊳ Distance computation
-
6:
-
7: end for
-
8: for to do ⊳ One-step forward recursion
-
9:
-
10: end for
-
11: Set
-
12: Output
-
13: end while
This real-time version only requires storing the last computed rows of matrices and , resulting in a constant space and time complexity. This represents a significant improvement over the complexity of the “offline” version described in Algorithm 1. The overall complexity depends also on the computational cost of the distance function , which is for the Euclidean distance (where denotes the number of dimensions of the signals) and for the WP distance (where represents the window size). Consequently, the per-step time complexity is for OS-DTWEU and for OS-DTWWP.
Although each step has constant computational complexity, an optimized implementation is crucial not only for real-time applications but also for offline scenarios where many long sequences need be processed. For brevity, we describe in detail the optimized implementation of Algorithm 1 and then briefly explain how it can be adapted to the real-time version (Algorithm 2).
First, we note that computing the distance matrix requires evaluations of . These operations are mutually independent and thus inherently parallelizable. However, this parallelism does not extend to the subsequent recursion steps, where code efficiency becomes critical. In particular, pre-compilation of this stage can yield significant computational benefits. Given the simplicity of the recursive operations, such optimization is sufficient to ensure real-time feasibility.
When is the Euclidean distance, each evaluation has complexity, where denotes the number of dimensions. The entire distance matrix can be computed efficiently through vectorized operations across the and dimensions. For the WP distance, each evaluation has complexity (where represents the window size), however, standard scientific computing libraries like numpy and scipy lack built-in support for vectorized computation of this metric.
Calculating the entire matrix requires computing the correlation between all possible pairs of windows. To achieve this, we construct matrices and for each dimension , containing all possible windows of the respective signals. Specifically:and similarly for .
Next, we compute the column-wise covariance using an efficient method (e.g. numpy.cov). The resulting matrix satisfies:
The top-left submatrix (of size ) represents the covariance between columns of . The bottom-right submatrix (of size ) represents the covariance between columns of . The top-right and bottom-left submatrices represent the covariance between columns of and columns of .
The matrix can then be computed efficiently by performing standard operations on these submatrices.
For the online version reported in Algorithm 2, can be computed offline, and for each new sample we compute the column-wise covariance between and the vector .
Moreover, to ensure numerical stability we add a small value to the variances at the denominator of the WP distance in Equation 1.
2.3 Action completion time prediction with online DTW
2.3.1 Nominal and linear estimation methods
We present two approaches for predicting the completion time of a human action based on the real-time phase estimate provided by OS-DTW. This phase estimate offers valuable insight into the current progress of an action, allowing us to forecast its eventual completion.
The first approach assumes that the user will maintain their current pace, using the observed execution speed to estimate the remaining time. The second approach relies on a nominal execution pace derived from historical demonstrations. Both methods, calibrated with prior data, can enable systems to estimate the future duration of a human action—a capability critical for applications such as assistive robotics and collaborative tasks that require timely intervention.
We assume access to a reference trajectory and a set of training trajectories, each of which is -dimensional and sampled at a constant time step . For convenience, we define as the function returning the completion time of a trajectory , and as the function returning the estimated phase for the same trajectory at time .
The nominal duration is defined as the average duration of all training trajectories and the reference trajectory:
Thus, the best a-priori estimate for the completion time, based solely on prior information and the current time, is:
We define the nominal estimation method asand the linear estimation method aswhere is the current time and is the estimated phase.
The nominal method assumes execution progresses at the nominal speed, with the remaining time estimated as . The linear method assumes a constant execution speed, scaling the current time inversely with the estimated completion percentage . The linear method is analogous to that employed by Maderna et al. (2019).
2.3.2 Hybrid estimation method
The linear estimation method is highly sensitive to phase miscalculations and can be inaccurate when the available trajectory percentage is insufficient to reliably estimate future execution speed. Conversely, the nominal estimation method does not leverage past execution speed as an informative metric for predicting the completion time. To address these limitations, we derive an optimal switching rule based on the estimated phase and elapsed time to transition from the nominal to the linear estimation method.
We begin by calculating the optimal switching time. The mean absolute estimation error (MAE) at time for the nominal estimation method is defined asand for the linear estimation method as
To determine the optimal switching time, we calculate the cumulative nominal costs up to each time and the cumulative linear costs from each time up to the maximum final time :
The total cost for switching at time is given by:
Thus, the optimal switching time is:
A similar procedure can be applied to estimate the optimal switching phase. We define the MAE for a phase using the nominal and linear estimation methods respectively as:
The cumulative costs are then:
The total cost for switching at phase is given by:
The optimal switching phase is:
2.4 Online DTW for proactive assistance in human-robot collaboration
In this subsection, we propose the application of OS-DTWWP in a human-robot collaboration setting. We introduce the Proactive Assistance through action-Completion Estimation (PACE) framework, which leverages the estimated phase of human actions to synchronize the robot’s behavior with the human’s workflow in a collaborative assembly task. The goal of PACE is to minimize idle times for both the human and the robot, ensuring efficient and seamless collaboration. A depiction of the PACE training scheme is provided in Figure 2.
FIGURE 2

PACE training scheme. Demonstrations of the collaborative assembly task are first recorded, including human hand trajectories and action durations. These data are used to set up a simulation (left), which models the task as a POMDP. A policy (right) is then trained with proximal policy optimization, using as inputs: (i) the human action completion percentage (estimated by OS-DTWWP), (ii) the elapsed time of the current human action, (iii) the current human action, and (iv) the last robot action.
PACE models the system as a Partially Observable Markov Decision Process (POMDP) (Kaelbling et al., 1998), and utilizes data collected from human demonstrations and OS-DTWWP to create a simulated environment and train a policy via Reinforcement Learning (RL). This approach enables direct training with the estimated phase, eliminating the need to explicitly compute action completion times as an intermediate step.
2.4.1 Collaborative problem formulation
We consider a scenario where a human and a robotic manipulator concurrently perform separate tasks. The robot executes a sequence of robot-task actions , which are repeated indefinitely. Simultaneously, the human undertakes a sequence of human actions, denoted as . The human requires the robot’s assistance to complete a subset of these actions, referred to as joint actions and represented by , where .
To formalize the relationship between joint and human actions, we define the operator to map the index of a joint action to the corresponding human action index, such that . For simplicity, we assume the last human action is a joint action (i.e., ), and no two consecutive human actions are joint actions (i.e., if , then ).
To assist the human, the robot must first complete its current robot-task action before pausing its ongoing task. Once paused, the robot performs a preparatory action (e.g., repositioning or collecting a tool) to prepare for the joint action. After completing the joint action, the robot executes a homing action before either resuming its task or preparing for the next joint action. The sets of preparatory and homing actions are denoted as and , respectively. A depiction of the task as a hierarchical state machine is provided in Figure 3.
FIGURE 3

Hierarchical state machine depicting the collaborative task from the robot perspective. The robot transitions from ROBOT-TASK to ASSISTING in between states if . Once an (assist) action is completed, the robot goes back to its task. The state machine follows the conventions as in Lee and Seshia (2017). Each transition is labeled with guard/effect. The guard determines whether the transition may be taken on a reaction. The effect specifies what outputs are produced on each reaction. H denotes a history transition. The red dot a preemptive transition.
Additionally, we assume access to a set of human demonstrations for each non-joint human action . These trajectories, denoted as , consist of the Cartesian positions of the human hand along the -, -, and -axes.
The objective is to minimize the total idle times for both the robot and the human . This is achieved by optimizing the cost function:where is a weighting coefficient that balances their relative importance.
2.4.2 POMDP formulation
The collaboration problem is modeled as a finite-horizon episodic POMDP. In this framework, the robot acts as an agent that makes binary decisions between robot-task actions–whether to assist the human or continue its task–while the human is treated as part of the environment. Formally, the POMDP is defined by the tuple
, where:
• is the state space;
• is the binary set of policy actions;
• is the transition probability;
• is the reward function;
• is the observation space;
• is the observation function.
Note that policy actions should not be confused with the task actions defined in the previous section.
Each element of the state space
is defined as
, where:
is the last robot-task action;
is the current human action;
represents the joint action that human and robot should perform next;
is the elapsed time since the start of ;
is the observed human hand trajectory during ;
are the idle times observed during the last transition for the robot and human, respectively.
The transition function is the probability of the state evolving from to . The state variables evolve as follows:with remaining state variables updated based on observed interactions. In the next section we describe a model to simulate the evolutions of these quantities.
The reward directly minimizes the cost defined in Equation 2:
The observation function is defined as , where represents the estimated completion percentage of , computed from using OS-DTWWP. For each human-only action , we select the first trajectory in the training set as the reference for OS-DTWWP. Thus, the observation consists of the last robot-task action, current human action, elapsed time, and phase estimate. The definition of the observation space follows accordingly.
2.4.3 Simulated environment
Training an RL policy directly on physical hardware is impractical due to time constraints and the constant requirement of human involvement in the task. To address this, we develop a simulated environment that models the collaborative task defined in Section 2.4.1, leveraging human demonstrations to approximate real-world dynamics. This environment enables efficient training of online and on-policy RL algorithms while preserving the POMDP structure formalized in Section 2.4.2.
To model the collaborative task, we assume the duration of each action follows a Gaussian distribution, and estimate them from demonstration data. Specifically, , where corresponds to human, robot-task, preparatory, and homing actions, respectively.
At the beginning of each episode, we sample from these distributions the durations human actions , preparatory actions , and homing actions . Then, one trajectory is sampled from the set of demonstrations for each non-joint action .
Moreover, to avoid overfitting on the training data, we linearly rescale the time axis of each trajectory to align with each sampled duration . As a result, each new trajectory represents either a compressed or stretched version of an actual demonstration. We found this augmentation essential for ensuring robustness and improving the policy’s generalization capabilities.
By employing these quantities, in addition to the state evolution dynamics described in Section 2.4.2, we model the transitions of the POMDP from to as: is the duration of the robot-task .
is the duration of the transition: is a function that, given the current human action and its elapsed time , returns the ongoing human action after a time , namely,
Moreover we assume that the human always starts from the first action, while the robot is already in operation. As a result, the initial state is non-deterministic. Specifically, we assume the first human action to start when the robot is performing an action .
We derive the initial robot action by sampling from a generalized Bernoulli distribution . Namely, each action has a probability:
Then, initialize the state as:where denotes the uniform distribution.
2.4.4 RL algorithm: proximal policy optimization
To solve the POMDP described in
Section 2.4.2within the simulated environment of
Section 2.4.3, we select the Proximal Policy Optimization (PPO) (
Schulman et al., 2017) algorithm. Our choice is motivated by four key factors:
Hybrid State Space Handling: PPO natively supports state spaces combining both discrete and continuous observations , eliminating the need for algorithm modifications.
Stability: PPO employs a clipped surrogate objective that constrains policy updates, which prevents large variations of the policy and ensures learning stability despite noisy phase estimates and incomplete state observations. This is particularly important in our framework, as human behavior is highly stochastic and unpredictable.
Variance Handling: On-policy advantage estimation accounts for actual interaction variance, critical given the stochasticity of human action durations.
Efficiency: PPO achieves superior sample efficiency which is critical for human-robot collaboration where data collection is costly.
While alternative algorithms could be considered, PPO avoids several pitfalls that make them less suitable for our application. For example, Deep Q-Networks (DQN) (Huang, 2020) are limited to discrete action spaces and would require explicit discretization of continuous inputs, while also struggling with partial observability. Trust Region Policy Optimization (TRPO) (Schulman et al., 2015) shares many of PPO’s theoretical guarantees, yet, it incurs significantly higher computational overhead without providing empirical gains for binary action spaces. In the case of Soft Actor-Critic (SAC) (Masadeh et al., 2019), its design for continuous control introduces unnecessary complexity when applied to discrete action spaces. Overall, PPO emerges as the most suited choice for the PACE framework, striking an optimal balance between simplicity, sample efficiency, and performance.
3 Experiments
We evaluated the proposed methods in a real-world scenario through a pilot study involving the assembly of an IKEA chair (see Fig. Figure 4). We collected human hand trajectories from users performing a collaborative assembly task with a robot manipulator. This task also serves to test the PACE framework, both demonstrating a practical application of OS-DTWWP and showcasing the efficacy of PACE in enhancing human-robot synergy.
FIGURE 4

Experimental setup for the wooden chair assembly process. On the left, the robotic workcell andworking areas, including tables for specific parts of the process (sorting, warehouse, assembly). On the right, the disassembled components of the IKEA Ivar chair, including: the left and right sides of the chair,and the rails connecting these sides.
3.1 Experimental setup
The experimental setup consists of the robotic workcell shown in Figure 4, including a Franka Emika Panda manipulator and three main working areas: a sorting table for the robot task, a warehouse table where the components to be assembled are stored, and an assembly table where the collaborative assembly process takes place. The chair is the IKEA Ivar wooden chair1 reported in Figure 4, where the dowel pins have been substituted with neodymium magnets to simplify the rails insertion step. The robot is programmed using the ROS2 and MoveIt ]3 frameworks. An RGB-D camera monitors the area around each table, utilizing April-Tag (Malyuta et al., 2019) markers to locate the chair components. Participants were equipped with the Xsens MVN Awinda motion capture system (Schepers et al., 2018), which recorded the position of their right hand at a sampling rate of 10 Hz. Alternative tracking gloves such as Rokoko4, HaptX5, could be employed.
3.2 Task description
A typical chair assembly process involves several steps, including positioning the chair sides, placing the rails, aligning the screws, and tightening the components together.
In our setup, the human operator performs the majority of the assembly but requires the robot’s assistance at specific stages, such as transporting large components or handing over tools. Meanwhile, the robot concurrently carries out an independent task involving a series of cube sorting operations.
As shown in
Figure 4, the chair assembly process begins with the right side of the chair already positioned on the
assembly table, and the remaining chair components on the
warehouse table. According to the formulation described in 2.4.1, the process is outlined as follows:
1. The human connects the 4 rails to the right side of the chair. This is denoted as rail placing action, which corresponds to .
2. The robot and human collaboratively transport the left side from the warehouse area and place it on top of the rails .
3. The human adjusts the chair side and places 3 screws on top. This is the screw placing action and corresponds to .
4. The robot hands an Allen key to the human .
5. The human uses the key to tighten two of the screws. This is the screwing action, corresponding to .
6. The robot hands over a second Allen key to allow the human to tighten the remaining screw .
A depiction of the collaborative assembly process considered in the pilot study is reported in Figure 5, highlighting the main steps described above.
FIGURE 5

Main steps in the collaborative assembly process. (A)Rail placing. (B) Collaborative transport of the left side of the chair. (C)Screw placing. (D) Handover of the first Allen key. (E)Screwing. (F) Handover of the second Allen key to tighten the last screw. Figure contains images of the author(s) only.
3.3 Data collection
To gather the training trajectories, we employed a system where users explicitly requested assistance from the robot via a button press. We collected data from 5 subjects, with each subject performing the experiment 4 times. Additionally, one of the subjects provided an extra demonstration to generate the references for the DTW algorithms. Thus, we employed a total of 21 trajectories per action to tune or train the various methods.
For reference, the average duration of each human action was approximately 22 s for rail placing, 18 s for screw placing, and 40 s for screwing. The robot preparatory actions for the following joint actions took on average 11 s, 8 s, and 9 s, respectively. Each robot cube sorting move, represented by the action , had a duration of approximately 8 s.
3.4 PACE training
We implemented the POMDP described in 2.4.2 as a custom Gymnasium environment (Towers et al., 2024) and used the Stable-Baselines3 library (Raffin et al., 2021) for training the PACE policy. Out of the 4 demonstrations per subject, 3 were used for training and 1 for validation. Moreover, based on Decker et al. (2017); Brosque and Fischer (2022), in our experiments we assume that the cost of employing a robot is approximately one-third of the cost of human labor, thus setting the parameter of the reward function equal to 3.
3.5 Test experiments design
The test experiments involved 12 volunteers (5 women and 7 men) aged 24 to 28, two of whom also participated as training subjects.
In addition to the PACE framework, which monitors human task progression, participants tested two alternative systems. The first is a baseline system (explicit query) where the human operator explicitly requests robot assistance via a button after completing each action. The second is a variant of the PACE framework (PACE w/o phase), which operates without actively monitoring users or using phase information.
Participants received instructions on the assembly task and the robot’s action capabilities. Each participant tested all three systems (explicit query, PACE w/o phase, and PACE) in a randomized order, completing two trials for each system. No prior information was provided to the users regarding the differences between the two proactive policies. After each set of trials, participants filled out two surveys: the NASA-TLX (Hart and Staveland, 1988), and a custom 5-point Likert scale questionnaire (see Figure 12).
From the experiments conducted with the PACE framework, we collected a total of 24 trajectories per action from different subjects, which were used as the test dataset to validate our methods.
4 Results
As outlined in the previous section, all the results presented hereafter are derived from a dataset consisting of 24 trajectories per action. The training dataset includes 20 trajectories per action, along with one reference trajectory. The reference trajectories are reported in Figure 6.
FIGURE 6

Reference trajectories for the rail placing, screw placing, and screwing tasks. The four peaks in the rail placing trajectory correspond to the placement of each rail. Similarly, the screw placing trajectory exhibits three peaks, each representing the placement of a screw, but it also includes an initial transient where the human adjusts the position of the chair side. The screwing trajectory corresponds to the tightening of two different screws with an Allen key, where each small oscillation matches one turn of the key.
4.1 Phase estimation
As discussed in Section 2.2.2, defining a ground truth for the phase estimate is not straightforward. The alignment produced by Dynamic Time Warping (DTW) depends on the chosen distance metric and constraints, and different metrics or constraints can yield varying alignments. Since there is no universal rule for selecting the “best” metric or constraints, this introduces subjectivity and makes it challenging to establish an objective ground truth. While human annotations could be used to define alignments based on interpretation, they are inherently subjective, inconsistent, and thus unreliable as a ground truth.
For these reasons, in the following results, we employ Soft-DTW—as in Cuturi and Blondel (2017)—to obtain the ground truth phases where applicable6. All trajectories were manually inspected to determine the optimal parameter for Soft-DTW. However, in some cases, no clear optimum exists, and multiple plausible ground truths may emerge. An example of this is illustrated in Figure 7. Additionally, we observed that Soft-DTW struggled to align many trajectories in the screwing task effectively, as illustrated in Figure 8. As discussed in Section 2.2.1, the Euclidean distance bases its alignment on the absolute value of the signals, making it less effective at capturing local patterns—a task in which the Windowed-Pearson (WP) distance excels. Therefore, for the screwing experiments, we adopted a modified version of Soft-DTW that utilizes the WP distance, referred to as Soft-DTWWP. To clarify the distinction, we refer to the “standard” Soft-DTW as Soft-DTWEU, explicitly indicating its reliance on the Euclidean distance.
FIGURE 7

The left plot shows the x-dimensions of the original reference and test trajectories. The middle plot displays the aligned test trajectory using Soft-DTWEU for two different parameters , and the corresponding phases are shown in the right plot, where the gray line represents the reference phase evolution. By examining the original signals, we observe that the test trajectory is shorter than the reference trajectory, indicating that the user performed the action faster. This is reflected in the phase plot, where the phases of the test trajectory exhibit a steeper slope compared to the reference phase evolution. These plots refer to a rail-placing experiment, where each spike ideally represents the placement of one rail by the user. However, the test trajectory presents five spikes due to a blunder: one of the rails fell and needed to be repositioned. Soft-DTWEU aligns the first two spikes together for , while it aligns the last two spikes for . Nevertheless, there is no clear optimal choice between the two alignments.
FIGURE 8

These plots are similar to those shown in Figure 7 but correspond to a screwing experiment. Here, we highlight the differences between Soft-DTWEU and Soft-DTWWP. The middle plot has been truncated for better visualization. Each oscillation in the signal ideally represents one turn of the screw. By looking at the middle plot, it is evident that the method relying on the Euclidean distance fails to align the signals effectively.
Figure 9 shows the average Mean Square Errors (MSE) of the estimated phase for Open-end DTW and Open-end Soft-DTW, using either the Euclidean distance or the Windowed-Pearson distance. All parameters (, ) where tuned separately for each method and task to minimize the average MSE across the entire trajectories. The bar plot shows the errors divided into quartiles: the average error is calculated for each quarter of the trajectories (0%–25%, 25%–50%, 50%–75%, and 75%–100%). The baseline error is the error obtained by considering a nominal phase evolution, i.e.:where is the nominal duration as in Section 2.3.1.
FIGURE 9

Average mean squared error (MSE) of the estimated phase with respect to the ground truth computed with Soft-DTW6. The bar plot shows the results across the three tasks (rail placing, screw placing, and screwing) for OE-DTWEU, OE-DTWWP, OS-DTWEU, and OS-DTWWP. Results are reported over the 1st, 2nd, 3rd, and 4th quarters of the trajectories. Values of the bars exceeding the vertical limit are indicated on top.
We observe that the baseline error increases with each quartile, as expected. This trend occurs because, similar to the reference phases depicted in Figures 7, 8, the estimated phases progressively deviate from the true phase over time. This deviation is a natural consequence of the linear approximation used to model phase evolution, where errors accumulate as the trajectory progresses. The longer the trajectory, the larger the error tend to become. While the same reasoning applies to DTW-based methods, the availability of more data enables the algorithms to refine the alignments dynamically. This results in non-monotonic error trends, as improved alignment with additional data can mitigate error growth. Especially in rail placing and screw placing, OE-DTWWP and OS-DTWEU demonstrate their ability to better align trajectories as more data becomes available, effectively reducing errors even as sequences lengthen.
From these results we notice that classical Open-end DTW (OE-DTWEU) performs poorly in all tasks. The same applies to its version incorporating the WP distance (OE-DTWWP) with the exception of the screwing task, where OE-DTWWP performs even better than OS-DTWEU, namely the soft DTW employing the Euclidean distance.
OS-DTWEU achieves the best performance in the two placing tasks, yet it performs poorly in screwing. The Open-End Soft-DTW with the WP distance (OS-DTWWP) excels in screwing, where the other methods struggle, while maintaining good performances in the other two.
In rail placing and screw placing, the Euclidean distance proves effective because absolute rail and screw positions remain consistent across trials. In the screw placing task, the initial transient phase–where users adjust the chair side (as described in Section 3.2 and illustrated in Figure 6)–introduces unstructured local hand motions. These variations are sometimes misinterpreted by the WP distance due to its reliance on local window correlations. In contrast, the Euclidean distance succeeds by focusing on global hand-position consistency, which remains relatively stable during adjustments. Nevertheless, in the screwing task, Euclidean metrics struggle with divergent absolute screwdriver positions, while the WP distance thrives by matching local rotational patterns, such as the repetitive turns of the Allen key.
These findings highlight a critical insight: while the use of the soft minimum mitigates temporal misalignment, the choice of distance metric determines whether global positional trends or local shape similarities drive the alignment. This decision must align with the dominant features of the target task, emphasizing the importance of selecting the method that better aligns with the specific characteristics of the application.
4.2 Action completion time estimation
As discussed in Section 2.3.2, we expected the linear method to improve over time and, on average, surpass the nominal method after a certain elapsed time or time percentage, once sufficient data were available to estimate the phase and the user’s execution speed accurately. For these reasons, and to enhance the robustness of the hybrid method against potential distribution shifts, we employed both switching criteria jointly. Specifically, in the hybrid method, the linear method replaces the nominal method only when both conditions are met: and .
A significant advantage of switching to the linear method over the nominal one was observed in the training data only for OS-DTWWP in the rail placing and screwing tasks. The results relative to the rail placing training experiments are shown in Figure 10.
FIGURE 10

Average Absolute Errors for Rail Placing. The left plot depicts the average error over time, while the right plot shows it with respect to the percentage of the total duration of each trajectory. Curves have been smoothed for clarity.
All results on the test datasets are summarized and reported in Table 1.
TABLE 1
| Phase method | Estimation method | Rail placing | Screw placing | Screwing | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Q1 | Q2 | Q3 | Q4 | Q1 | Q2 | Q3 | Q4 | Q1 | Q2 | Q3 | Q4 | ||
| OE-DTWEU | Nominal | 4.14 | 3.54 | 4.59 | 6.14 | 4.71 | 4.10 | 3.43 | 3.91 | 7.78 | 15.44 | 22.48 | 24.87 |
| Linear | 38.25 | 27.01 | 29.87 | 16.05 | 64.11 | 9.55 | 7.53 | 7.75 | 201.7 | 544.5 | 419.2 | 322.2 | |
| OE-DTWWP | Nominal | 4.81 | 5.25 | 6.68 | 6.92 | 5.34 | 6.05 | 5.55 | 4.68 | 6.71 | 5.33 | 4.72 | 3.69 |
| Linear | 120.9 | 11.79 | 14.88 | 10.16 | 113.6 | 21.83 | 11.56 | 7.20 | 24.28 | 5.56 | 5.31 | 4.29 | |
| OS-DTWEU | Nominal | 2.79 | 1.95 | 1.73 | 1.37 | 3.55 | 2.92 | 1.62 | 1.05 | 7.03 | 6.49 | 7.18 | 5.65 |
| Linear | 7.86 | 5.25 | 3.18 | 1.46 | 7.91 | 3.65 | 2.08 | 0.99 | 18.44 | 10.01 | 9.93 | 6.96 | |
| Hybrid | 2.79 | 1.95 | 1.73 | 1.37 | 3.55 | 2.92 | 1.62 | 1.05 | 7.03 | 6.49 | 7.18 | 5.67 | |
| OS-DTWWP | Nominal | 3.62 | 2.58 | 2.42 | 1.79 | 4.88 | 4.72 | 3.12 | 1.95 | 6.39 | 5.30 | 4.24 | 2.95 |
| Linear | 5.62 | 3.48 | 2.87 | 1.62 | 7.87 | 5.14 | 2.53 | 1.63 | 13.18 | 4.04 | 4.07 | 3.33 | |
| Hybrid | 3.56 | 3.52 | 3.09 | 1.82 | 4.88 | 4.75 | 2.97 | 1.96 | 6.49 | 4.01 | 3.80 | 3.19 | |
| - | Time-Only | 4.02 | 4.02 | 4.02 | 3.98 | 4.82 | 4.82 | 4.82 | 4.46 | 6.97 | 6.97 | 6.97 | 5.86 |
Average Mean Absolute Error (seconds) per quartile (Q1: 0%–25%, Q2: 25%–50%, Q3: 50%–75%, Q4: 75%–100%) for each task.
Values exceeding those obtained with the Time-Only method are shown in red. The minimum values for each task and quartile are highlighted in bold.
As demonstrated in the phase results section, the “non-soft” DTW methods generally perform poorly, leading to larger estimation errors when using the linear time estimation method.
For the soft DTW methods, switching from the nominal to the linear method is beneficial in approximately half of the cases. The nominal and hybrid methods using OS-DTWEU were the best in the rail placing and screw placing tasks. In contrast, the hybrid method using OS-DTWWP was the overall best in the screwing task.
Overall, the hybrid method either outperformed or matched the performance of the other methods in each task. However, it did not achieve ideal results in the rail placing and screw placing tasks with OS-DTWWP, likely due to variations in the optimal switching time and phase between the training and test datasets.
4.3 Collaborative assembly results
The goal of this experiment is to evaluate whether introducing proactive robot behavior can reduce downtime during assembly and enhance the quality of the assembly experience from the user’s perspective. Additionally, we aim to demonstrate that monitoring human progress—specifically using OS-DTWWP—can further improve human-robot synergy, both in terms of reducing waiting times and enhancing user experience.
To this end, as explained in Section 3.5, we tested three different systems (explicit query, PACE w/o phase, and PACE) in real-user experiments. PACE employs OS-DTWWP to estimate the phase.
To assess subjective aspects, we employed both the NASA Task Load Index (Hart and Staveland, 1988), a widely used multidimensional assessment tool that rates perceived workload (see Figure 11), and a custom 5-point Likert scale questionnaire (see Figure 12) specifically tailored for the task at hand.
FIGURE 11

NASA-TLX (Hart and Staveland, 1988) findings for subjective measures on a 5-point scale ranging from Very Lowto Very High. The boxplot displays medians, interquartile ranges, and whiskers representing the data distribution. The questions are the following. Mental: How mentally demanding was the task?Physical: How physically demanding was the task?Temporal: How hurried or rushed was the pace of the task?Performance: How successful were you in accomplishing the task?Effort: How hard did you have to work to accomplish this task?Frustration: How insecure, discouraged, irritated, stressed, or annoyed were you?
FIGURE 12

Findings for subjective measures on a 5-point scale ranging from Strongly Disagreeto Strongly Agree. The boxplot displays medians, interquartile ranges, and whiskers representing the data distribution. The questions are the following. Rushed: I felt rushed by the robot’s actions. Delay: I felt that the robot took too long to assist me. Understanding: I felt the robot had a good understanding of the task. Fluency: The robot and I collaborated fluently. Satisfaction: I feel satisfied by the performance of the system.
For each experiment, we recorded the robot idle times and a video of the entire assembly. The video recordings were later inspected to annotate the precise end of each human action to calculate the relative waiting times. We report the quantitative results of the robot’s and users’ waiting times before each of three joint action in Table 2.
TABLE 2
| Real experimental results | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| System | Robot Waiting Time [s] | Human Waiting Time [s] | Cost | ||||||
| A1 | A2 | A3 | Total | A1 | A2 | A3 | Total | ||
| Explicit query | 0.0 | 0.0 | 0.0 | 0.0 | 14.00 | 11.28 | 12.15 | 37.43 | 112.3 |
| PACE w/o phase | 1.56 | 3.48 | 11.64 | 16.68 | 1.79 | 1.02 | 1.06 | 3.87 | 28.29 |
| PACE | 1.93 | 2.65 | 1.28 | 5.86 | 1.20 | 0.56 | 2.56 | 4.32 | 18.81 |
Collaborative assembly results of real experiments (12 subjects, 2 trials per method). Columns include human and robot waiting times for A1 (Rail Placing), A2 (Screw Placing), and A3 (Screwing) tasks, and overall cost. The value in bold indicates the lowest cost.
As expected, the system with the explicit query appears to be the most penalizing for the user’s waiting time. This is also reflected in the subjective user results, which generally indicated that the robot took too long to provide assistance. Excluding the baseline—which always shows null robot idle time by design—Table 2 demonstrates that the method employing the phase estimate reduces robot idle time nearly to one-third, without significantly impacting the user’s waiting time. Moreover, as shown in Figure 12, the method that monitors human progress outperforms the others in subjective measures as well. The users reported a higher level of fluency, understanding, and overall satisfaction, confirming that the method adapts to the pace of various participants. Additionally, Figure 11 shows that a proactive robot operating autonomously does not negatively impact mental strain or the overall Task Load Index. Notably, five out of twelve participants explicitly stated in the open comment section of the questionnaire that they preferred the system monitoring human task advancement, with many appreciating that it provided assistance only as they neared the end of their action.
Moreover, in Table 3, we report the results obtained with the training data on the POMDP defined in Section 2.4.2. The PACE w/EU system reported in the tables, refers to a variant of the PACE policy that employs OS-DTWEU to estimate the phase. The oracle system represents an ideal policy with posterior knowledge of the completion time of each action, serving as a theoretical lower bound.
TABLE 3
| Training trajectories | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| System | Robot Waiting Time [s] | Human Waiting Time [s] | Cost | ||||||
| A1 | A2 | A3 | Total | A1 | A2 | A3 | Total | ||
| Explicit query | 0.0 | 0.0 | 0.0 | 0.0 | 14.13 | 11.27 | 11.95 | 37.35 | 112.05 |
| PACE w/o phase | 4.10 | 5.99 | 12.49 | 22.57 | 0.58 | 0.07 | 1.11 | 1.76 | 27.87 |
| PACE w/EU | 4.20 | 5.42 | 13.89 | 23.51 | 0.43 | 0.17 | 1.08 | 1.69 | 28.59 |
| PACE | 1.90 | 6.10 | 5.45 | 13.45 | 0.92 | 0.10 | 0.81 | 1.83 | 18.94 |
| Oracle | 1.96 | 1.32 | 1.83 | 5.11 | 0.25 | 0.26 | 0.28 | 0.79 | 7.49 |
Collaborative assembly results obtained in simulation with the real training trajectories (5 subjects, 4 trajectories per subject and method). Results are averaged across 1000 different seeds. Columns include human and robot waiting times for A1 (Rail Placing), A2 (Screw Placing), and A3 (Screwing) tasks, and overall cost. The value in bold indicates the lowest cost achieved without hindsight information.
For a fairer comparison, we replayed the test trajectories using the same POMDP and report the results in Table 4. Quantitatively, we observe minimal benefits from phase estimation methods (including the oracle) in both rail-placing and screw-placing actions. We attribute this to the relatively short duration of these human actions compared to robot preparation and task execution times, as the combination of lengthy robot operations with brief human actions fundamentally limits the potential advantages of progress estimation in this context. While PACE outperforms all baselines, PACE w/EU fails to improve over PACE w/out phase. This aligns with OS-DTWEU’s poor performance during screwing actions.
TABLE 4
| Test trajectories replayed | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| System | Robot Waiting Time [s] | Human Waiting Time [s] | Cost | ||||||
| A1 | A2 | A3 | Total | A1 | A2 | A3 | Total | ||
| PACE w/o phase | 1.67 | 2.91 | 10.90 | 15.48 | 1.37 | 0.52 | 0.68 | 2.58 | 23.22 |
| PACE w/EU | 2.09 | 2.59 | 11.12 | 15.79 | 0.99 | 0.62 | 0.61 | 2.23 | 22.40 |
| PACE | 1.96 | 2.77 | 1.27 | 5.99 | 1.09 | 0.57 | 2.58 | 4.25 | 18.75 |
| Oracle | 2.19 | 1.16 | 1.56 | 4.91 | 0.25 | 0.54 | 0.28 | 1.07 | 8.12 |
Collaborative assembly results obtained in simulation by replaying the test trajectories. Results are averaged across 1000 different seeds. Columns include human and robot waiting times for A1 (Rail Placing), A2 (Screw Placing), and A3 (Screwing) tasks, and overall cost. The value in bold indicates the lowest cost achieved without hindsight information.
5 Discussion
Our experimental results demonstrate that real-time progress estimation, enabled by novel Open-end Soft-DTW variants, addresses critical gaps in human-robot collaboration (HRC) systems. While most existing methods reactively monitor human activity at the task level (Cheng et al., 2021; Ramachandruni et al., 2023), they lack mechanisms to accommodate the natural variability in human execution pace. In contrast, our approach monitors individual actions at the atomic level, enabling robots to dynamically synchronize with their human counterparts–a capability missing from previous frameworks.
Early attempts to estimate progress, such as Open-end DTW (Maderna et al., 2019), introduced temporal flexibility but proved oversensitive to real-world trajectory variations. Our work bridges this gap with Open-end Soft-DTW (OS-DTWEU), which mitigates local minima through a softmin operator, marking its first real-time application in HRC. This innovation allows robust handling of unpredictable human motions while maintaining computational feasibility.
The task-dependent performance of our methods highlights the need for context-aware estimation. For instance, OS-DTWWP excelled in screwing actions by capturing local rotational patterns, whereas OS-DTWEU proved superior in placement tasks, where positional consistency is paramount. These findings suggest that developing new distance metrics, combining existing ones, or selecting metrics based on training data or task semantics could further enhance adaptability.
The practical impact of these advancements is evident in the PACE framework, which reduced robot idle times by nearly two-thirds while maintaining low user waiting times. Unlike reactive systems (Giacomuzzo et al., 2024), PACE proactively synchronizes assistance by continuously monitoring progress, aligning with human preferences for anticipatory support (Lasota and Shah, 2015). Subjective evaluations further confirmed that participants perceived PACE-driven robots as more intuitive partners, with improved fluency and responsiveness–a critical factor in fostering trust.
Our hybrid completion-time prediction method further validates the synergy between prior knowledge and real-time estimation. Its consistent performance across tasks demonstrates the practicality of combining historical data with online alignment, even in scenarios with inter-user variability. These results not only highlight the strengths of our approach but also underscore the broader potential of real-time progress estimation in transforming HRC systems.
Nonetheless, several limitations should be acknowledged. First, by relying on a single reference trajectory for OS-DTWWP, our method overlooks the fact that many actions admit multiple valid execution modes, which can differ substantially in movement patterns while still achieving the same goal. Moreover, our formulation exclusively leverages human motion, without incorporating additional state information–such as object pose or environmental cues–that could further improve action completeness estimation. Finally, the PACE framework assumes a fixed action sequence, whereas real-world collaborations often involve dynamic task orderings, execution errors, or mid-action changes in user strategies and preferences. Addressing these limitations will be essential for scaling our approach to more unstructured and unpredictable HRC settings.
5.1 Future works
Building on these findings, this work lays the foundation for several promising directions in human-robot interaction (HRI) and collaborative robotics. While validated in industrial assembly, our framework has the potential to generalize to diverse domains, such as home robotics, assistive care, and collaborative cooking. Additionally, we believe Dynamic Time Warping (DTW) can be extended to compute phase estimation accuracy and identify potential failures, paving the way for more robust and reliable systems.
A key area for future research is the integration of multi-modal sensing to enhance progress estimation. While our current framework relies on kinematic data, incorporating visual and contextual inputs–such as object affordances and environmental cues–could significantly improve flexibility and applicability. For instance, leveraging deep learning methods to process visual data could enable systems to handle unstructured tasks, such as improvised meal preparation or assistive caregiving, where task sequences and object interactions are highly variable. At the same time, because many tasks can be completed in multiple valid ways, future methods should be capable of modeling these multimodal distributions to anticipate diverse execution modes and adapt robot behavior accordingly.
Another critical direction is the development of frameworks that extend beyond rigid, predefined workflows. While PACE demonstrates robust performance in structured industrial tasks, real-world scenarios often involve unpredictable task sequences, errors, or mid-action changes in user preferences. Future systems must address these challenges by incorporating adaptable collaboration strategies, such as dynamic task re-planning and error recovery mechanisms. This would enable robots to seamlessly adjust to human actions, even in highly unstructured environments.
5.2 Conclusion
Beyond our technical contributions, this work highlights a broader imperative: real-time prediction of human action completion times remains vastly underexplored despite its critical role in seamless human-robot collaboration. While much of the existing research has overlooked action-level progress estimation, our results show that techniques like online DTW deliver significant improvements in synchronization and user satisfaction, as evidenced by the PACE framework. We aim to raise awareness within the HRC community about the urgent need for focused research in this area, and hope that this work inspires the development of new, effective learning methods. The efficiency gains, reduced idle times, and positive user feedback achieved with PACE illustrate the transformative potential of proactive, action-aware HRC systems—a paradigm shift essential for deploying robots in shared, real-world environments.
In conclusion, our results position real-time progress estimation as a cornerstone for collaborative robotics. By enabling robots to “keep pace” with humans at the level of individual actions, we unlock fluid, adaptive teamwork, where machines no longer wait for explicit cues but anticipate and align with human partners seamlessly. This shift from rigid synchronization to dynamic co-adaptation represents a critical leap toward truly intuitive human-robot collaboration.
Statements
Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Ethics statement
Ethical approval was not required for the studies involving humans because this study was conducted in Padova, Italy, as a pilot study involving 12 adult participants. It consisted of a brief, non-medical interaction with a robot and the completion of anonymous questionnaires. No personal, sensitive, or identifiable data were collected. In accordance with Italian regulations and the General Data Protection Regulation (EU) 2016/679, ethical approval is not required for non-interventional, low-risk behavioral studies that do not involve risks to participants or the processing of personal data. The study was carried out in compliance with local legislation and institutional requirements. Participants were verbally informed about the purpose of the study, the voluntary nature of their participation, and the use of anonymized data for research purposes. Given the anonymous and non-intrusive nature of the study, written informed consent was not required under applicable regulations.
Author contributions
DD: Writing – original draft, Formal Analysis, Data curation, Validation, Methodology, Visualization, Investigation, Software, Writing – review and editing, Conceptualization. MT: Writing – review and editing, Data curation, Investigation, Software. GG: Writing – review and editing, Conceptualization. SJ: Conceptualization, Writing – review and editing. PF: Conceptualization, Writing – review and editing. RC: Supervision, Writing – review and editing, Resources, Conceptualization. SG: Writing – review and editing, Resources. DR: Conceptualization, Methodology, Supervision, Project administration, Writing – original draft, Writing – review and editing.
Funding
The author(s) declare that financial support was received for the research and/or publication of this article. This research was partially supported by the European Union – NextGenerationEU (C96E22000110005). Open Access funding provided by Università degli Studi di Padova/University of Padua, Open Science Committee.
Conflict of interest
Authors SJ and DR were employed by Mitsubishi Electric Research Laboratories. Author DD’s research was also partially funded by Mitsubishi Electric Research Laboratories.
The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that no Generative AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/frobt.2025.1623884/full#supplementary-material
Footnotes
1.^ https://www.ikea.com/us/en/p/ivar-chair-pine-90263902/
4.^ https://www.rokoko.com/products/smartgloves
6.^Soft-DTWEU was used to obtain the ground truths for the rail placing and screw placing tasks. However, this method was not effective for aligning one of the screw placing trajectories, in which case we employed Soft-DTWWP. Soft-DTWWP was also used for the screwing task.
References
1
An J. Kang H. Han S. H. Yang M.-H. Kim S. J. (2023). “Miniroad: minimal rnn framework for online action detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 10341–10350.
2
Brosque C. Fischer M. (2022). Safety, quality, schedule, and cost impacts of ten construction robots. Constr. Robot.6, 163–186. 10.1007/s41693-022-00072-5
3
Chen F. Sekiyama K. Cannella F. Fukuda T. (2013). Optimal subtask allocation for human and robot collaboration within hybrid assembly system. IEEE Trans. Automation Sci. Eng.11, 1065–1075. 10.1109/tase.2013.2274099
4
Chen L.-H. Zhang J. rong Li Y. Pang Y. Xia X. Liu T. (2023). “Humanmac: masked motion completion for human motion prediction,” in 2023 IEEE/CVF international conference on computer vision (ICCV), 9510–9521.
5
Cheng Y. Tomizuka M. (2021). Long-term trajectory prediction of the human hand and duration estimation of the human action. IEEE Robotics Automation Lett.7, 247–254. 10.1109/lra.2021.3124524
6
Cheng Y. Sun L. Liu C. Tomizuka M. (2020). Towards efficient human-robot collaboration with robust plan recognition and trajectory prediction. IEEE Robotics Automation Lett.5, 2602–2609. 10.1109/lra.2020.2972874
7
Cheng Y. Sun L. Tomizuka M. (2021). Human-aware robot task planning based on a hierarchical task model. IEEE Robotics Automation Lett.6, 1136–1143. 10.1109/lra.2021.3056370
8
Chi S. Chi H.-G. Huang Q. Ramani K. (2025). Infogcn++: learning representation by predicting the future for online skeleton-based action recognition. IEEE Trans. Pattern Analysis Mach. Intell.47, 514–528. 10.1109/TPAMI.2024.3466212
9
Cuturi M. Blondel M. (2017). “Soft-dtw: a differentiable loss function for time-series,” in International conference on machine learning (Sydney, Australia: PMLR), 894–903.
10
Dang L. Nie Y. Long C. Zhang Q. Li G. (2021). “Msr-gcn: multi-Scale residual graph convolution networks for human motion prediction,” in 2021 IEEE/CVF international conference on computer vision (ICCV), 11447–11456.
11
De Lazzari D. Terreran M. Giacomuzzo G. Jain S. Falco P. Carli R. et al (2025). “Pace: proactive assistance in human-robot collaboration through action-completion estimation,” in 2025 IEEE international conference on robotics and automation (ICRA), 6725–6731. 10.1109/ICRA55743.2025.11127399
12
Decker M. Fischer M. Ott I. (2017). Service robotics and human labor: a first technology assessment of substitution and cooperation. Robotics Aut. Syst.87, 348–354. 10.1016/j.robot.2016.09.017
13
Giacomuzzo G. Terreran M. Jain S. Romeres D. (2024). “Decaf: a discrete-event based collaborative human-robot framework for furniture assembly,” in 2024 IEEE/RSJ international conference on intelligent robots and systems (IROS), 7085–7091. 10.1109/IROS58592.2024.10802728
14
Hart S. G. Staveland L. E. (1988). Development of nasa-tlx (task load index): results of empirical and theoretical research. Hum. Ment. workload1, 139–183. 10.1016/S0166-4115(08)62386-9
15
Huang Y. (2020). “Deep q-networks,” in Deep reinforcement learning: fundamentals, research and applications, 135–160.
16
Itakura F. (1975). Minimum prediction residual principle applied to speech recognition. IEEE Trans. Acoust. Speech, Signal Process.23, 67–72. 10.1109/TASSP.1975.1162641
17
Janati H. Cuturi M. Gramfort A. (2020). “Spatio-temporal alignments: optimal transport through space and time,” in Proceedings of the twenty third international conference on artificial intelligence and statistics. Editors ChiappaS.CalandraR. (Proceedings of Machine Learning Research), 1695–1704.
18
Jeong Y.-S. Jeong M. K. Omitaomu O. A. (2011). Weighted dynamic time warping for time series classification. Pattern Recognit.44, 2231–2240. 10.1016/j.patcog.2010.09.022
19
Kaelbling L. P. Littman M. L. Cassandra A. R. (1998). Planning and acting in partially observable stochastic domains. Artif. Intell.101, 99–134. 10.1016/S0004-3702(98)00023-X
20
Lasota P. A. Shah J. A. (2015). Analyzing the effects of human-aware motion planning on close-proximity human–robot collaboration. Hum. factors57, 21–33. 10.1177/0018720814565188
21
Lee E. A. Seshia S. A. (2017). Introduction to embedded systems: a cyber-physical systems approach. MIT press.
22
Leu J. Cheng Y. Tomizuka M. Liu C. (2022). “Robust task planning for assembly lines with human-robot collaboration,” in Proceedings of the international symposium on flexible automation 2022 international symposium on flexible automation, 188–195. 10.11509/isfa.2022.188
23
Maderna R. Lanfredini P. Zanchettin A. M. Rocco P. (2019). “Real-time monitoring of human task advancement,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE), 433–440.
24
Maderna R. Ciliberto M. Maria Zanchettin A. Rocco P. (2020). “Robust real-time monitoring of human task advancement for collaborative robotics applications,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 11094–11100. 10.1109/IROS45743.2020.9341131
25
Malyuta D. Brommer C. Hentzen D. Stastny T. Siegwart R. Brockers R. (2019). Long-duration fully autonomous operation of rotorcraft unmanned aerial systems for remote-sensing data acquisition. J. Field Robotics37, 137–157. 10.1002/rob.21898
26
Manousaki V. Argyros A. (2023). “Partial alignment of time series for action and activity prediction,” in Computer vision, imaging and computer graphics theory and applications. Editors de SousaA. A.DebattistaK.PaljicA.ZiatM.HurterC.PurchaseH.et al (Cham: Springer Nature Switzerland), 89–107.
27
Mao W. Liu M. Salzmann M. (2020). “History repeats itself: human motion prediction via motion attention,” in Computer vision – ECCV 2020. Editors VedaldiA.BischofH.BroxT.FrahmJ.-M. (Cham: Springer International Publishing), 474–489.
28
Mao Y. Deng J. Zhou W. Fang Y. Ouyang W. Li H. (2023). “Masked motion predictors are strong 3d action representation learners,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 10181–10191.
29
Masadeh A. Wang Z. Kamal A. E. (2019). “Selector-actor-critic and tuner-actor-critic algorithms for reinforcement learning,” in 2019 11th International Conference on Wireless Communications and Signal Processing (WCSP) (IEEE), 1–6.
30
Matsuo S. Wu X. Atarsaikhan G. Kimura A. Kashino K. Iwana B. K. et al (2023). Deep attentive time warping. Pattern Recognit.136, 109201. 10.1016/j.patcog.2022.109201
31
Nielsen N.-P. V. Carstensen J. M. Smedsgaard J. (1998). Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping. J. Chromatogr. A805, 17–35. 10.1016/S0021-9673(98)00021-1
32
Raffin A. Hill A. Gleave A. Kanervisto A. Ernestus M. Dormann N. (2021). Stable-baselines3: reliable reinforcement learning implementations. J. Mach. Learn. Res.22, 1–8. 10.5555/3546258.3546526
33
Rahman S. M. Sadrfaridpour B. Wang Y. (2015). “Trust-based optimal subtask allocation and model predictive control for human-robot collaborative assembly in manufacturing,” in Dynamic Systems and Control Conference (American Society of Mechanical Engineers). 10.1115/dscc2015-9850
34
Ramachandruni K. Kent C. Chernova S. (2023). Uhtp: a user-aware hierarchical task planning framework for communication-free, mutually-adaptive human-robot collaboration. ACM Trans. Human-Robot Interact.13, 1–27. 10.1145/3623387
35
Ray A. Raj A. Kolekar M. H. (2025). “Autoregressive adaptive hypergraph transformer for skeleton-based activity recognition,” in 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (IEEE), 9690–9699.
36
Sakoe H. (1979). Two-level dp-matching–a dynamic programming-based pattern matching algorithm for connected word recognition. IEEE Trans. Acoust. Speech, Signal Process.27, 588–595. 10.1109/TASSP.1979.1163310
37
Sakoe H. Chiba S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech, Signal Process.26, 43–49. 10.1109/TASSP.1978.1163055
38
Schepers M. Giuberti M. Bellusci G. (2018). Xsens mvn: consistent tracking of human motion using inertial sensing. Xsens Technol.1, 1–8. 10.13140/RG.2.2.22099.07205
39
Schulman J. Levine S. Abbeel P. Jordan M. Moritz P. (2015). “Trust region policy optimization,” in International Conference on Machine Learning. Editors BachF.BleiD. (Lille, France:PMLR), 1889–1897.
40
Schulman J. Wolski F. Dhariwal P. Radford A. Klimov O. (2017). Proximal policy optimization algorithms.
41
Snoek J. Larochelle H. Adams R. P. (2012). Practical bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process. Syst.25. 10.5555/2999325.2999464
42
Tomasi G. Van Den Berg F. Andersson C. (2004). Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data. J. Chemom. A J. Chemom. Soc.18, 231–241. 10.1002/cem.859
43
Towers M. Terry J. K. Kwiatkowski A. Balis J. U. Cola G. Deleu T. et al (2024). Gymnasium (v1.0.0a2). 10.5281/zenodo.11232524
44
Trigeorgis G. Nicolaou M. A. Zafeiriou S. Schuller B. W. (2016). “Deep canonical time warping,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
45
Wang J. Dong L. Huang C. Wang Y. (2023). Crosscorrelation-based dynamic time warping and its application in wave equation reflection traveltime inversion. Geophysics88, R737–R749. 10.1190/geo2023-0089.1
46
Yan S. Xiong Y. Lin D. (2018). Spatial temporal graph convolutional networks for skeleton-based action recognition. Proc. AAAI Conf. Artif. Intell.32. 10.1609/aaai.v32i1.12328
47
Zhao J. Itti L. (2018). Shapedtw: shape dynamic time warping. Pattern Recognit.74, 171–184. 10.1016/j.patcog.2017.09.020
Summary
Keywords
open-end dynamic time warping, human action progress estimation, human action completion time prediction, human-robot interaction, collaborative assembly, real-time monitoring, reinforcement learning, sliding window cross-correlation
Citation
De Lazzari D, Terreran M, Giacomuzzo G, Jain S, Falco P, Carli R, Ghidoni S and Romeres D (2025) Real-time human progress estimation with online dynamic time warping for collaborative robotics. Front. Robot. AI 12:1623884. doi: 10.3389/frobt.2025.1623884
Received
06 May 2025
Revised
07 September 2025
Accepted
02 October 2025
Published
04 December 2025
Volume
12 - 2025
Edited by
Shaoming He, Beijing Institute of Technology, China
Reviewed by
Javier Felip Leon, Intel, United States
Mohammad Hassan Farhadi, University of Rhode Island, United States
Hamidreza Heidari, Malayer University, Iran
Updates
Copyright
© 2025 De Lazzari, Terreran, Giacomuzzo, Jain, Falco, Carli, Ghidoni and Romeres.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Diego Romeres, romeres@merl.com
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.