PARTI-A Haptic Virtual Reality Control Station for Model-Mediated Robotic Applications

In this paper, we introduce a tele-robotic station called “PARTI” that leverages state-of-the-art robotics and virtual reality technology to enable immersive haptic interaction with simulated and remote environments through robotic avatars. Our hardware-in-the-loop framework integrates accurate multibody system dynamics and frictional contacts with digital twins of our robots in a virtual environment with real-time computational capabilities. This model mediated hardware-in-the-loop approach to robotic control allows a teleoperator to use the PARTI system to teach, evaluate, and control various robotic applications. In the current contribution, we focus on the general system description, integrated simulation and control framework, and a series of experiments highlighting the advantages of our approach.


As in
, we also develop a simulation environment with different use cases such as training medical and technical personal. Therefore realistic simulated haptic force feedback is required that also needs to consider the hardware on the simulated remote side. We opt MuJoCo's (Todorov et al., 2012) convenient representation in generalized coordinates for kinematic chains, yielding haptic feedback in case of undesired (self-) collisions. A notable byproduct of the simulation of the systems is a digital twin, which can be used for system diagnostics. However the focus of our simulation lies on physical realism and real-time aspects for haptic force feedback. This system allows us to explore complex scenarios where various physical and virtual agents interact with each other in a simulated environment, not unlike a haptic meta-verse for robotic systems (see Figure 1). We refer to this approach as model-mediated robot control and present several experiments to highlight the potential of this paradigm. The main components of our system, which we explore further, are namely: • State-of-the-art digital twins based on a multibody dynamics physical engine with real-time capability; • Haptic real-time interface rendering constraint forces from the virtual physics engine in generalized coordinates; • Hardware-in-the-loop interface for model-mediated closedloop control.
Our model-mediated robot control paradigm can be linked to the model-mediated teleoperation (MMT) approach (Xu et al., 2016). To our knowledge this is the first work that considers a physics engine as model for MMT. Virtual reality (VR)-based teleoperation is widely used for MMT to provide a non-delayed force and visual feedback and thus guarantee system stability and transparency. Many studies on VR-based MMT have focused on estimating the environment geometry. Valenzuela-Urrutia et al. (2019) have built a virtual workspace that is created directly from point cloud data. Similarly in Ni et al. (2017), the remote real world is modeled in a local virtual environment with the help of RGB information and a 3D registration method. Both virtual environments, however, do not provide sufficient physical properties. The gap between the virtual model and the real world is still large. Recently, the digital twin methodology is applied to a robot-environment interaction for enhancing task performance with the help of Unity (Li et al., 2021). A 3-D generic graphic model as well as the contact dynamics of the real entities is modeled to mirror a real interactive scene. Physical properties of the real environment are also extracted, simultaneously. This work, however, does not consider real teleoperation scenarios with communication unreliability. System control schemes to deal with various tasks and communication delays are not studied. As a contrast, in this paper, we have proposed a digital twin system with a physical engine to describe the full dynamics of the real environment. This minimizes the gap between visual-haptic rendering in virtual environment and the real-world interaction. In addition, our work considers teleoperation over real communication networks, where time delay and packet loss are inevitable. To address these issue, stability-ensuring control schemes such as the Time Domain Passisivity Approach (TDPA, Ryu and Preusche (2007)) and MMT are investigated on our system.

Hardware
PARTI is designed to emulate and mirror the same physical properties of humanoid avatar GARMI, whilst at the same time enabling a pleasant human-robot and haptic interaction. To this aim, the first design choice concerns the hardware layout. The robot arms were mounted horizontally, rather than vertically-with care consideration with respect to the reachability space of the human operator. This mode of operation requires a modified firmware on each of the robots master controllers for proper gravity FIGURE 1 | System overview of our model-mediated approach to robot control. The simulation environment is populated with digital twins that may interact with each other and the environment. Hardware interfaces allow for transparent coupling of the digital twin to its real world counterpart to close the control loop in the physical world.
Frontiers in Virtual Reality | www.frontiersin.org July 2022 | Volume 3 | Article 925794 compensation. Also precise adjustment of the main frame is mandatory, such that gravity compensation works as intended. Therefore, PARTI's baseframe is supported by four heightadjustable wheels for leveling purposes. The seat as well as robotic arms are adjustable to fit the operators demands while mirroring the properties of the humanoid avatar GARMI. A second obvious design choice was to use the Panda robot from Franka Emika, a torqueactuated 7-DoF robotic system, for each arm. The robotic arm is embedded with torque sensors on every joint, which in turn properly emulated the GARMI robot and, most importantly, provided precise (cf. Kirschner et al. (2021)) wrench estimation on task-space on different levels of the robotic kinematics, i.e., in different parts of the serial chain. Such feature enabled real-time wrench rendering in our HIL framework at the custom end-effectors is shown in Figure 2. For haptic-and visual rendering, a performant computer (Intel ® Core ™ i7-9700F, 16 GiB RAM, NVIDIA GeForce RTX 2080 Ti) is connected to both master controllers via a switch generating torque signals for the robotic arms or reading telemetry data in a 1 kHz control loop. Since teleoperation requires depth perception, we use a 3D-PluraView monitor for stereoscopic rendering. The PARTI hardware framework, as described above, is shown in Figure 2 with its base frame and backwall consisting of a robust construction of itemprofiles enabling flexible prototyping.

Simulation and Control
Our framework integrates physically accurate digital twins (see Figure 3) directly into the real-time control loop of our hardware to provide a framework for model-mediated hardware-in-the-FIGURE 2 | Overview of the hardware components of the PARTI system.
FIGURE 3 | Visualization of our two primary digital twin models in MuJoCo. The model of the humanoid robot GARMI shown on the left includes fully actuated arms, head and a friction driven differential drive system with castor wheels and passive suspension. The model of PARTI to the right also features a fully actuated two-arm system. These models can run purely in a software environment or be coupled to the real hardware using the haptic or HIL interfaces.
Frontiers in Virtual Reality | www.frontiersin.org July 2022 | Volume 3 | Article 925794 loop control. The hardware-in-the-loop requirements of our framework-that integrates virtual worlds and a bimanualrobot digital-twin to our haptic framework and real-world applications-calls for most recent developments in robotics' hardware, as previously described, as well as for new control algorithms and state-of-the-art physics engines. Indeed, our proposed framework has only be made feasible due to separated developments from the past 5-10 years. 1 For the physics engine and core virtual environment, our system relies on MuJoCo 2 as it is one of the modicum engines that satisfies our requirements, that is, it is capable of running highly constrained and contact rich simulations in real-time at a control frequency of 1 kHz (cf. Todorov et al. (2012)). Compared to other popular robot simulators such as Gazebo, MuJoCo has the additional advantage of simulating in generalized coordinates (cf. Erez et al. (2015)). We use the latter to define an interface between the physical system and its digital twin for haptic interaction with the simulation environment. Given the Lagrangian describing the system dynamics of an n-degree-of-freedom digital twin as where M ∈ R n×n is the total inertia, c(q, _ q), g(q) ∈ R n the Coriolis and centrifugal and gravity forces, q, _ q ∈ R n position and velocity and τ dt,d , τ constr , τ bias ∈ R n the applied, constraint and bias forces all in generalized coordinates. We couple the digital twin's motion to the physical system using a PD control law in generalized coordinates where K, D ∈ R 7×7 are gain matrices and the subscripts phys and dt are used to refer to the physical system and digital twin respectively. Note that while we could set the position of the digital twin directly, we use a tracking controller instead. This way the dsigital twin is subject to the entire simulation pipeline, including computation of the constraint force τ constr and bias force τ bias c(q, _ q) + g(q). The constraint force computed by MuJoCo includes the forces produced by contacts and joint limits (cf. Todorov (2014)) and can be directly applied to the physical system as where λ ∈ R n×n is a gain matrix that usually comes down to a single scalar to adjust for the reality gap between the digital twin and the physical system. In addition, we usually apply a slight low-pass IIR filter to τ constr to make haptic interactions smoother for the user. We also implemented a hardware-in-the-loop (HIL) interface that allows us to run closed-loop controllers from within the simulator on the physical system. This is done by superseding the state of the digital twin with that of the physical system and applying the computed control signal on the physical system, thus closing the loop. Using the HIL interface, we can run experiments developed in the simulator on real hardware without any development overhead.

EXPERIMENTS
We performed a series of experiments to highlight the advantages provided by our model-mediated approach to robot control and the design of our control station. The first experiment uses the real-time interface to our digital twin to haptically render generalized constraint forces on the robot. Using this methodology, we can transfer physical behavior modelled in the simulation environment to reality with minimal effort. This is demonstrated by applying a safety margin to the collision meshes of the digital twin. The real robot consequently adheres to this updated model, which serves as a form of collision avoidance-and enables the design of geometric-based virtual walls for task-specific applications. In the second experiment, we evaluate the physical fidelity of our digital twin for the Franka Emika Robot System. We compare the tracking performance as well as the estimated contact wrenches of our model for the MuJoCo simulator to those of the real system as well as a model for the Gazebo simulation environment provided by Franka Emika. Finally, we explore the use of our system as a teleoperation station controlling a humanoid robot in a contact-rich scenario. We compare different approaches such as bilateral position-force computed teleoperation as well as model mediated control.

Collision Avoidance
We use the haptic interface introduced in Section 2.2 to connect the PARTI system to its digital twin. For this experiment, the collision model of the digital twin is extended with a safety margin in the form of two spherical meshes around the end effectors as seen in Figure 4. While our approach allows the use of arbitrary meshes 3 , we chose spherical shapes for salient visualization and affordable computation of distances. The haptic interface will then render the generalized constraint forces as torques on the robot arms with regard to this updated model, effectively preventing any collisions between the end effectors and the rest of the system. To demonstrate the emerging collision avoidance behavior, an operator attempts to bring the end effectors into collision with each other and the main body of the PARTI system, while we record the shortest distance between the surfaces of the involved collision meshes during the motion. The compiled results in Figure 5 confirm that our approach successfully prevented the intersection of collision geometries.

Digital Twin
The second experiment explores the physical fidelity of our digital twin. We compare our MuJoCo model of the Franka Emika Robot System as well as the Gazebo model, provided by Franka Emika, to 1 Notice the Panda system, as well as, any other collaborative robot with jointtorque sensors have only be made available around 2017, the study of digital twins as dynamic proxies in model-mediated control applications is a recent development (cf. Mitra and Niemeyer (2007)), and MuJoCo was firstly made available in Todorov et al. (2012). 2 www.mujoco.org 3 MuJoCo uses the Minkowski Portal Refinement algorithm to detect collisions and as such requires convex meshes. However, non-convex geometries may be modelled by decomposing them into a union of convex geometries (Todorov et al., 2012) Frontiers in Virtual Reality | www.frontiersin.org July 2022 | Volume 3 | Article 925794 the physical robot, which serves as a ground truth. For this purpose, we prepared a trajectory, which we taught kinesthetically using the PARTI system. The trajectory includes fast free motion as well as contacts with a solid plane below the base, which is perpendicular to the robot's first axis. We execute this trajectory using a task space PD control law with additional nullspace control running at a frequency of 1 kHz, which reads where τ d ∈ R 7 is the control torque applied to the robot, J(q) ∈ R 6×7 , x d ∈ R 6 and _ x ∈ R 6 are the end effector Jacobian, pose, and twist respectively, q ∈ R 7 are the joint positions, and I ∈ R 7×7 is the identity matrix. The trajectory is given to the controller as a list of Cartesian end effector poses x d ∈ R 6 . The gain matrices K p , K d ∈ R 6×6 and K null ∈ R 7×7 as well as the desired nullspace configuration q null,d , which is equal to the initial joint positions 4 , are identical across the experiments. We present the resulting end effector positions and external force estimates in Figure 6 and a more detailed view of the model errors compared to the real hardware in Figure 7. The FIGURE 4 | Visualization of the digital twin of the PARTI system in MuJoCo (left) and the coupled real hardware (right) during the first experiment, where the operator actively attempts to bring the system into collision. The additional collision meshes around the end effectors to facilitate the collision avoidance behavior are seen in red.
FIGURE 5 | Shortest distances between the collision surfaces of the PARTI system's left and right end effectors as well as the main body during the first experiment. The constraint forces from the simulator applied to the physical system prevent the collision meshes from intersecting, i.e. the distances remain greater or equal to 0. Note that the sections highlighted in grey mark active attempts by the operator to bring the respective bodies into collision. 4 The initial joint positions are chosen such that they have the largest distance to the joint limits Frontiers in Virtual Reality | www.frontiersin.org July 2022 | Volume 3 | Article 925794 FIGURE 6 | The left column of graphs shows the resulting end effector positions as well as the tracked trajectory for the second experiment, while the right column displays the respective models' estimated external forces acting at the end effector during the same time period. All quantities are expressed relative to the robot's base frame. Periods of contact are highlighted by the grey vertical bars. Note that the robot is mounted 2 cm above the plane and contact is therefore established slightly below 0 on the z-axis while the reference trajectory reaches below the plane to facilitate solid contact wrenches given the controller's impedance.
FIGURE 7 | Linear and angular error norms for the pose (left) and external wrench estimate (right) as observed in the second experiment. The model error of the Gazebo implementation is compared to the error of our own digital twin based on MuJoCo, where telemetry from the real system serves as ground truth.
Frontiers in Virtual Reality | www.frontiersin.org July 2022 | Volume 3 | Article 925794 6 robot shows some characteristics that are difficult to model due to the proprietary nature of the control and signal processing system. Especially the estimates for external wrenches show an offset and sensitivity to configuration and regions of the workspace (cf. Petrea et al. (2021)).

Gazebo Model
Franka Emika provides a simulation environment, which is well embedded in the ROS framework, granting the user all parameters for implementing custom control laws. Also the telemetry data for evaluation can be accessed easily. Within the ROS framework, we implement a simple ROS node, which publishes each point of the recorded trajectory. We then record the telemetry data with another node. Surprisingly, the Gazebo simulation behaved not as expected. The results diverged, both from the estimates provided by the MuJoCo environment and from the real system. The estimated forces showed a greater magnitude during movement and static contacts. A closer look to the code base revealed that it utilizes a type of direct estimator with further moving average filtering for computing external torques where superscript (t) denotes the current timestep, and τ ext ∈ R 7 andτ ext ∈ R 7 the estimated torque and the filtered torque estimate, respectively. τ m ∈ R 7 represents the measured torque and g(q) ∈ R 7 denotes the gravitational induced generalized forces. The filter parameter γ should be chosen such that γ ∈ (0, 1]. In order to get more insight to the simulation, we inquired Franka Emika. The main reason for employing the estimation, stems from the underlying ODE engine, which enforces constraints for each joint and therefore does not know the external torques in joint space. The implemented observer resembling the direct observer (Haddadin et al., 2017) uses commanded torque signals generated by some control law, while neglecting inertia and coriolis/centrifugal terms. We assume that these approximations have an significant effect for the external torque estimates especially during free end effector motion. During the first 10 s of the traversed trajectory in Figure 6, the forces of the Gazebo environment are notably overestimated, most likely due to the uncompensated inertia terms.

MuJoCo Model
The kinematic and geometric properties of our digital twin for the MuJoCo simulation environment are based on information provided by the manufacturer, specifically the Denavit-Hartenberg parameters and the visual meshes for the robot links, respectively. For the dynamic model, i.e. the links' inertia tensors, mass and center of mass, and the friction in the joints, we use the parameters identified in Gaz et al. (2019). Mechatronic components like the motors, torque sensors, or the position encoders in the joints are modelled using MuJoCo's equivalent native components. Given these definitions, the simulator's computational engine provides us with the necessary dynamic model quantities for robot control, such as the Jacobian matrix for all the bodies, Coriolis and gravity forces, and inertia in generalized coordinates. While we can retrieve the generalized constraint forces from MuJoCo to numerical precision, we opted to implement a momentum observer for estimating the external torque at the joints τ ext to more closely model the behavior of the real robot (cf. Haddadin et al. (2017)). Given the definition of the robot's momentum p as the external torques can be estimated aŝ where τ m , c(q, _ q), g(q) ∈ R 7 are the applied motor force, centrifugal and Coriolis forces, and gravity force, respectively, as computed by MuJoCo in the robot's generalized coordinates, and K O diag{k O,i } > 0 is the observer's gain matrix. Additionally, we compute _ q numerically using the Euler method for the observer instead, as joint velocities are not directly measured by the robot. The estimated external wrench acting at the end effector is then simply computed as This estimate shares characteristics with the ground truth, such as a high frequency but low magntitude noise and sensitivity to robot movements, while the estimated wrench corresponds well with the robot's own estimates during contacts (cf. Figure 6).

Two-Arm Teleoperation
This experiment demonstrates our system's versatility as a teleoperation station by controlling the digital twin of the robot humanoid GARMI in a contact-rich cooperative twoarm manipulation scenario. The task consists of lifting a box out of a socket and putting it inside a shelve with only small clearance (cf. Figure 8). During this task, the box is only held in place by the resulting friction due to the pressure exerted by the arms. The operator controls both arms independently using the two robot arms of the PARTI system and receives haptic force feedback for both. The virtual scene is graphically rendered in stereo from the perspective of the humanoid's head camera and displayed stereoscopically on the PARTI display to provide the operator with depth perception. For this experiment, we coupled the controlled velocity of the GARMI mobile base to the arms' displacements along the humanoid's sagittal axis 5 . This way, the operator can focus on the manipulation task without the additional mental load of having to explicitly control the robot's mobile drive system. We evaluated different control schemes to perform this task. First, we directly control the humanoid, i.e. we use the haptic interface introduced in Section 2.2 to connect the physical PARTI system to the digital twin of GARMI. This is possible as the systems were designed to have identical kinematic and dynamic configurations. We refer to this approach as model-mediated teleoperation (MMT), as the physical system interacts solely with a local representation of the remote environment and as such is not subject to communication delays (cf. Xu et al. (2016)). As an alternative approach, we implemented a more traditional bilateral position-force computed (PFC) architecture. Within this approach, the twists of the leader system's end effector are applied to the follower system, while estimated wrenches at the respective follower's end effectors are reflected as haptic force feedback. We complement this architecture with passivity controllers informed by energy observers to compensate for communication delays as introduced in Ryu and Preusche (2007). Our implementation of the latter method-commonly referred to as time domain passivity approach (TDPA)-uses a six degree of freedom two channel architecture for each arm. In total, our experiment consists of three teleoperator configurations, namely the MMT variant as well as the PFC architecture with and without TDPA and a round-trip delay of around 35 ms. For analysis, we compute the net external force of GARMI's end effectors and the relative transform between the end effectors over time as depicted in Figure 9.

DISCUSSION
The results of our experiments show the benefits of a modelmediated approach to robot control in various applications, including scenarios with a human operator in the loop. The first experiment demonstrated how a simple change in the model can result in complex behavior, which can be transferred to the real hardware using the haptic interface. While the emerging collision avoidance behavior is useful in itself, it is worth noting that this approach allows us to focus on the physical modeling of a desired behavior without having to concern ourselves with the implementation of any controllers generating said behavior. We also showed that given identical dynamic parameters the behavior of our digital twin of the Franka Emika Robot System is closer to the real hardware than the provided Gazebo model, reinforcing our choice for MuJoCo as a simulator for model-mediated control. Finally, we demonstrated the advantage of using a model-mediated approach in teleoperation. Analysis shows that activity of the passivity controllers in TDPA results in small oscillations while the box is grasped. This can be explained by the fact that the pressure exerted by the arms on each other acts as an active disturbance, which in turn leads to dissipative action from the passivity controllers. While the arms remain stable, this small oscillation is enough to release the grasp of the box, leaving the operator unable to complete the task. In comparison, the grasp of the PFC architecture without active TDPA is more stable. However, the teleoperator quickly becomes unstable due to external disturbances such as contacts with the shelve. The MMT approach proved to be most successful for this task. Not only was the operator able to complete the task, but the system remained stable even during the challenging action of sliding the box into the shelve, which involves multiple contacts similar to the classical peg-in-hole problem of robotics.
FIGURE 8 | Visualized sequence of the digital twin of GARMI (bottom row) and the coupled real PARTI system (top row) during the third experiment. The operator is tasked with lifting the box from its socket, which has a clearance of only 2 mm and putting it into the shelve below with a clearance of 2 cm.
Frontiers in Virtual Reality | www.frontiersin.org July 2022 | Volume 3 | Article 925794 8 We introduced the PARTI system, a multimodal teleoperation station, intended for teleoperation tasks under a hardware-in-theloop framework for improved performance even in the presence of uncertainties and time delay. For the physics engine deployed in the HIL framework, experiments in virtual envrionment reveal that MuJoCo is advantageous for simulating kinematic chains during physical interaction as it enforces constraints in generalized coordinates. This in turn can be directly leveraged for self collision avoidance by designing collision meshes in simulation that enforce a certain safety margin. Furthermore, in a bimanual teleoperated manipulation task, we see that TDPA led to unstable system behaviour, emphasizing the model mediated teleoperation approach for future research. Since MuJoCo offers physical realism, we see our simulation environment as an ideal testbench for future research, including model-mediated teleoperation, system identification, and model-based/predictive control. The simulation can also serve as digital twin, providing an operator with additional information on the teleoperated system status in a humanunderstandable (visual) representation. For increasing the autonomy of the humanoid robot GARMI, the simulation might be potentially useful for kinesthetic teaching purposes. We also plan to incorporate VR in the near future, which will provide additional interaction options during teleoperation in tradeoff for situational awareness in the operators environment. We interpret a VR set as a minor restriction, since the operator is losing environmental awareness when wearing the head mounted display.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
JE: Implemented the HIL and haptic interfaces, digital twins in MuJoCo, two-arm teleoperation and wrote the corresponding sections. GR: Implemented and ran the experiments for Gazebo. FIGURE 9 | The graphs in the left column show the net force of the estimated external wrenches acting on GARMI's end effectors during the teleoperatoin task of the third experiment. The column on the right displays graphs of the displacement between the end effectors. Note that the force and displacement induced by the grasp are primarily expressed around the y-axis of this experiment. The MMT method allowed for the successful completion of the task, while the oscillations and instability of the PFC architecuteres with and without TDPA respectively resulted in the grasp being released before the task could be completed.
Frontiers in Virtual Reality | www.frontiersin.org July 2022 | Volume 3 | Article 925794 He wrote Section 5, Section 3.2.1, and Section 2.1. LF, AN, and UW: Supervised the study, provided feedback and wrote/edited the manuscript. SH: Contributed the idea and concept of PARTI and collected the funding to perform the study.