RealTHASC — a cyber-physical XR testbed for AI-supported real-time human autonomous systems collaborations

Today ’ s research on human-robot teaming requires the ability to test arti ﬁ cial intelligence (AI) algorithms for perception and decision-making in complex real-world environments. Field experiments, also referred to as experiments “ in the wild, ” do not provide the level of detailed ground truth necessary for thorough performance comparisons and validation. Experiments on pre-recorded real-world data sets are also signi ﬁ cantly limited in their usefulness because they do not allow researchers to test the effectiveness of active robot perception and control or decision strategies in the loop. Additionally, research on large human-robotteamsrequires tests and experiments that are too costly even for the industry and may result in considerable time losses when experiments go awry. The novel Real-Time Human Autonomous Systems Collaborations (RealTHASC)


Introduction
Emerging techniques in machine learning (ML), AI, and computer vision are able to equip the next-generation of autonomous robots with unprecedented sensing, cognitive, and decision-making skills (Bassyouni and Elhajj, 2021).These "smart" robots will inevitably interact with humans as part of collaborative robot teams in unstructured and dynamic environments, thus requiring fundamental cross-cutting research not only in engineering and computer science but also in the humanities (Fong et al., 2003;Pendleton et al., 2017;Liu et al., 2021).Real-world tests with human subjects in the loop are necessary in almost every sector of robotics, including defense, agriculture, healthcare, and emergency-response systems, to name a few.Humans that collaborate with AI software and/or autonomous robot teammates, also known as human-autonomy teams (HATs), will soon be required to solve complex and safetycritical tasks (Oh et al., 2017) such as target localization and mapping (Krajník et al., 2014), fire fighting (Naghsh et al., 2008), search and rescue (SpurnỲ et al., 2019), security and surveillance (Hu and Lanzon, 2018), and cooperative transportation (Chen et al., 2015).To date, controlled simulation and laboratory experiments have been primarily used by researchers in both industry and academia in order to develop and test new theories and algorithms.As a result, when introduced into real-world applications with humans-in-the-loop, they all too often fail because of unanticipated difficulties and environments.Deploying new decision and control algorithms in the field is not only costly but also poses safety hazards when humans are present and, at best, fails to provide researchers with the knowledge, control, and postprocessing capabilities required during the development phase.Recent advancements in data-driven approaches for robotics applications also require large amounts of training data that may not always be available from real-world sensors (Choi et al., 2021).
In response to the aforementioned difficulties, various computer simulation platforms have emerged in the field of robotics and HAT collaborations (Puig et al., 2018;Deitke et al., 2020;Shen et al., 2021).These platforms have played a crucial role in the iterative development of autonomous robots by providing a safe and controlled environment for testing, training, and refining AI algorithms and behaviors.Simulation environments have also allowed researchers and developers to explore a wide range of scenarios, manipulate variables, and assess the performance of robot control and decision algorithms in a cost-effective and time-efficient manner.By testing physical and cognitive functions such as sensing, perception and control, simulation systems have enabled the evaluation and comparison of different robot design choices and interaction strategies.Virtual replicas of real-world robots, also known as digital twins, have in fact become increasingly popular in both industry and academia for research on robot control and operation (Garg et al., 2021).Traditional robot simulators such as Webots (Michel, 2004) and Gazebo (Koenig and Howard, 2004) rely on physics-based Open Dynamics Engine (ODE) for replicating robot motion and on 3D rendering engine for constructing the robot environment (Erez et al., 2015).Although this architecture has proven extremely useful to date, there remains a significant simulation-to-reality gap for testing more complex systems of human-robot teams (Škulj et al., 2021).In order to aid in the development of robust autonomy algorithms, simulation environments must address major challenges, including transferring knowledge from the virtual world to the real world; developing stochastic and realistic scenes; and developing fully interactive, multi-AI agent, scalable 3D environments (Reiners et al., 2021).Off-the-shelf game development software, such as Unity and Unreal Engine ™ 4 (UE ™ ) (Epic Games, 2019), has been gaining increasing attention because of its photorealistic rendering capabilities.New simulation frameworks such as AirSim (Shah et al., 2018), UnrealCV (Qiu and Yuille, 2016), and CARLA (Dosovitskiy et al., 2017) have also demonstrated success in training and verifying computer vision and perception-based algorithms.Additionally, leveraging VR technologies alongside such platforms has enabled studies on collaboration between humans and robots mimicked by digital twins (Mizuchi and Inamura, 2017;Inamura and Mizuchi, 2021;Murnane et al., 2021).
In particular, FlightGoggles, a Unity-based photorealistic sensor simulator for perception-driven robots, has begun to bridge the simulation-to-reality gap (Guerra et al., 2019).This previous work developed a novel hardware-in-the-loop approach by placing a real vehicle and human in a motion capture facility and augmenting them with an obstacle-populated virtual environment.This framework successfully created a photorealistic virtual world in which both robot and human perception occurred virtually.Additionally, FlightGoggles successfully implemented multiple simulated robots to demonstrate their computational capabilities within a novel system architecture.As autonomous robots and intelligent machines are becoming an integral part of collaborative human teams, a significant body of research has focused on identifying effective means of human-robot communication, task allocation, and multi-robot coordination (Gemerek et al., 2019;Ognibene et al., 2022).While FlightGoggles has proven the capability to incorporate robots and humans into a desired computing architecture, it does not yet support complex perception and control human-robot interactions in mixed real/virtual worlds.
The RealTHASC facility presented in this paper provides a new extended reality testbed within which researchers may investigate complex real-time human-robot and multi-robot interactions based on tailored combinations of exteroceptive and proprioceptive sensors installed on virtual and real-world agents.Its main contribution is a novel framework for integrating VR technology, embedded systems design, and computer graphics tools to enable safe, real-time, and photorealistic multi-agent interaction for collaborative decision-making in HATs.Novel software and hardware integration tools are required to support real-time perception feedback from virtual avatars to real agents, with information-driven planning and control in the loop.The ability to support perception-based control, real-time communication between real and virtual agents, and HAT interactions and collaboration in challenging test environments is also demonstrated.The formulation and nomenclature necessary for describing the real and virtual workspace, along with their respective agents, are introduced in Section 2. The architecture of the testbed, including the simulation environment, human interface, communication between the agents, and robot perception and planning, is described in Section 3. To demonstrate the modality of RealTHASC, three applications encompassing human-robot and multi-robot interaction are presented and analyzed in Section 4. Finally, possible future research directions are discussed in Section 5.

Mathematical preliminaries and notation
The RealTHASC facility consists of a novel HAT testbed comprised of collaborative agents, operating and interacting in and across physical and simulated worlds.Let the experimental region located inside the physical laboratory, defined as the real workspace, be denoted by W ⊂ R 3 and let the simulated environment created in Unreal Engine ™ , defined as the virtual workspace, be denoted by U ⊂ R 3 .
Based on the desired type of HAT-environment interactions, four types of agents can be introduced into RealTHASC: real agents, virtual agents, avatars, and real agents with XR.Real human/robot agents operate and sense solely in W, whereas virtual human/robot agents operate and sense solely in U. Avatars sense in U and transmit their sensor data back to their corresponding real agent with XR, based on the avatar's position and orientation in U. Importantly, real agents with XR are kinematically coupled with their avatars.Hence, not only their sensors have the same field-of-view (FOV) as those of their avatars, but also they can interact with elements and agents in the virtual workspace through the avatar's behavior, react to the perception feedback, and the full body resulting state is relayed to their avatar in real time.Examples of RealTHASC agents operating in mixed environments that are comprised of real and virtual workspaces are shown in Figure 1.
Let R and H denote the index sets of robots and humans operating in W, where each robot and human is associated with an index i ∈ R and j ∈ H, respectively.Also, let P and Q denote the index sets of robots and humans operating in U, where each robot and human is associated with an index i ∈ P and j ∈ Q, respectively.
Real agents with XR are denoted by the same indices in R and H as their avatars in P and Q, respectively.The homogenous rigid-body transformation which maps every point of W into U is denoted by T, where its inverse T −1 is assumed to exist and both T and T −1 are known a priori from the RealTHASC setup.In this paper, unmanned ground vehicles (UGVs) are used as robot agents to showcase the capabilities of RealTHASC.The framework, however, can be easily extended to any other agents including autonomous underwater vehicles (AUVs) and autonomous aerial vehicles (AAVs).In demonstrations presented in this paper, human and robot agents all move on the ground plane, which is coplanar to the XY plane of the inertial frames F W and F U , embedded in W and U respectively.The configuration of robot i ∈ {R ∪ P} is denoted by q i [x i y i θ i ] T , and the pose of human j, comprised of his/her position and heading, is denoted by s j [x j y j θ j ] T .The configuration of all agents operating in W is estimated using a motion capture system or VR tracking, whereas the configurations of agents in U are known without error from the UE ™ simulation environment.In the demonstrations presented in this paper, every robot i ∈ R ∪ P { }obeys the unicycle motion model, where the control vector consists of the robot linear velocity v i and angular velocity ω i , or u i [v i ω i ] T .All the aforementioned notation is summarized in Figure 2.

System architecture
Within RealTHASC, physical robots and real humans interact and communicate with simulated agents as well as avatars in photorealistic virtual environments (Section 3.1).A key capability of the facility is that both simulated agents and avatars sense the virtual environments by means of UE ™ synthetic sensors, as shown in Figure 3, while possibly operating in potentially different real workspaces.Integrative cyber-physical interfaces (Sections 3.2-3.4)are leveraged to achieve this capability, thus enabling collaborative decisionmaking across real and synthetic worlds.This novel integration allows virtual environments to act as a common medium for safe yet realistic real-time inter-agent and agent-environment interactions.These interactions can then be modeled, analyzed, and leveraged for various applications without any restrictions imposed by the considerations normally associated with testing robots in the lab or in the wild, including safety and reproducibility.RealTHASC uses the 3D graphics development software, UE ™ , to create virtual environments.This provides the flexibility to test algorithms and collect data in a wide variety of environments such as subways, cities, offices, and oceans under varying lighting and weather conditions along with a diverse set of user-defined static and dynamic obstacles.The RealTHASC framework supports multiple autonomous and user-controlled agents, enabling complex online multi-agent control and coordination experiments.Example demonstrations of mixed policies implemented for the planning and control of both virtual and real robots are described in Section 3.5.
Figure 4 shows the RealTHASC framework used to achieve realtime collaboration between real robots and humans via their avatars in a virtual environment developed in UE ™ .The blue arrows indicate how the kinematics of the real-world agents are communicated to their respective avatars while the orange arrows show how the perception feedback of these avatars is communicated to their real-world counterparts.Human operators in W control their human avatars in U using real-time VR body tracking enabled by a VR headset and handheld controllers.The robots operating in W are equipped with reflective markers detected by the motion capture system installed in this workspace, which streams the real robot state to their respective robot avatars.Virtual sensors, defined using sensor application programming interfaces (APIs) as  discussed in Section 3.3, are used to simulate the robot FOV while the humans observe the workspace through the VR headset.Since the human and robot FOVs are defined inside the virtual environment, they can be constantly monitored in real time to facilitate shared perception in agents.This enables direct sharing of visual cues observed by any agent with any other agent, human or robot, in contrast to existing facilities which only consider sharing cues inside the robot FOV (Nourbakhsh et al., 2005).Additionally, humans in the RealTHASC facility can provide audio or visual commands to real and virtual robots in support of research and testing on human-robot interactions, as explained in Section 4.1.A summary of all hardware and software components used to create the RealTHASC facility is shown in Table 1.

RealTHASC UE™ simulation environment
In the RealTHASC facility, the UE ™ simulation environment acts as the interface between the real and virtual agents.The UE ™ software is chosen because it is currently considered as the most visually realistic tool for bridging the simulation-to-reality gap in perception-related tasks.The framework shown in Figure 1   FIGURE 4 The RealTHASC framework enables research on human-autonomy interactions and collaborations by importing both human and robot avatars, with sensor suites, into complex and photorealistic simulation environments with perfect ground truth.leverages UE ™ to support real-time rendering and manipulation of multiple programmable, photorealistic environments.The base environments used for the experiments presented in Section 4, namely, the industrial city (Figure 5A) and the undersea environment (Figure 5B), are obtained from the UE ™ Marketplace.These base environments are modified to include virtual agents and digital avatars, to support their associated sensing modalities, and to stream data amongst various agents of the HATs (Section 3.4).The ability to programmatically manipulate environmental conditions such as fog, time of day, and luminosity, while difficult in real-world or laboratory physical experiments, is easily accomplished within RealTHASC.As a result, real humans and robots can interact with and test a broad range of environmental conditions known to influence visual perception and active control tasks.
Digital avatars are created to resemble their real-world counterparts aesthetically, as required by the chosen experimental test or scenario, and also functionally by establishing a kinematic and sensing coupling with their real-world counterparts, as explained in Section 3.2 and Section 3.3.C++ programmable actors such as virtual robots, pedestrians, and mobile vehicles can be controlled offline using predefined trajectories or are equipped with simulated dynamics, perception, and control algorithms running online to test autonomy and collaboration algorithms with hardware or software-in-the-loop, as explained in Section 3.3 and Section 3.5 respectively.Two examples of such programmed actors with feedback controller-in-the-loop, an AAV and AUV, are shown in Figure 5.The AAV and AUV actors both sense their environment by the means of onboard RGBD camera and SONAR, respectively.

RealTHASC human interface
RealTHASC allows human operators to perceive and interact with synthetic environments and autonomous agents therein by synchronizing body movements with their avatars in real time.For instance, humans may need to react to robot motions and behaviors while also providing commands to their robot teammates in the HAT by means of semantics or hand gestures.Additionally, real robots and humans are able to interact with virtual humans and other human avatars in the UE ™ world, allowing one to test collaborative tasks performed by larger HATs that may include real humans and robots at different geographic locations.Human avatars are simulated using the Meta Quest 2 hardware with a Steam VR backend.Using the VR headset, a real human (with XR) is able to view the rendered frames from the simulation environment and listen to audio, as sensed by the human avatar.This integration provides the user with an immersive, interactive first-person experience of the simulation environment.The headset and accompanying handheld controllers have trackers which communicate their positions over Wi-Fi to the system running Steam VR connected with UE ™ .
The DeepMotion SDK (DeepMotion, 2023) is used to transform this data into joint motions of a predefined skeleton of a human avatar, using three-point VR body tracking.By this approach, human operators inside the RealTHASC facility are able to control their virtual avatars without attaching any extra markers to the body with a mean latency of 10 ms.This facilitates real-time seamless kinematic coupling and interaction.The appearance of human avatars becomes crucial when testing algorithms trained on real-world data sets obtained from application-driven environments (e.g., offices and industrial workshops) or when using UE ™ -based worlds and simulations to produce synthetic datasets.In order to solve the problem of simulationto-reality and reality-to-simulation transfer, a user interface is built using the UE ™ Blueprint to easily modify and select between avatars of interest depending on the chosen domain or HAT application.

RealTHASC robot sensing
One of the main goals of RealTHASC is to enable research on active sensor systems used to gather, process, and communicate information about their operating environments to their encompassing and surrounding agents (Ferrari and Wettergren, 2021).The broad range of sensing modalities available on real and virtual robots play a pivotal role in allowing for many types of HAT collaborations to be synthesized inside the facility.RealTHASC hosts a unique array of proprioceptive sensors, measuring the ego state of the robot, and exteroceptive sensors, measuring the state of the operating environment.Husarion ROSbots, powered by ARM processors, are used as the UGV robot agents operating in W.These real robots are equipped to localize using dead reckoning (Kao, 1991) enabled through rotary encoders and inertial measurement units (IMUs).These sensors are subject to drift and may be used to simulate navigation in GPS-denied scenarios.This facility has also been equipped with the OptiTrack motion capture system which provides robot localization within 10 mm accuracy using the reflective markers mounted on the robots.As shown in Figure 7, this information is streamed to the robot base control stations in real time, which allows the robots to use either the motion capture system or dead reckoning for localization.Since the real lab's workspace includes multiple robots, obstacles, and humans, the FOV of the motion capture cameras observing reflective markers can sometimes be blocked leading to loss of localization.In order to account for such cases, a localization strategy is designed that autonomously switches to dead-reckoning localization initialized at the last known localization of the robot and back to motion capturebased localization when the cameras see the markers.
Various modalities for perception are developed by integrating sensor APIs from UnrealCV (Qiu and Yuille, 2016), traditional computer vision algorithms, and recent advances in sensor modeling.This integration enables the simulation of threechannel 8-bit data streams including RGB cameras, online panoptic segmentation, and surface normal estimation.Onechannel (16-bit) images of ground truth depth are also acquired in real time using predefined depth cameras and stereo RGB cameras in UnrealCV.An online processing pipeline is defined to process RGB data streams from UnrealCV to generate 16-bit (two 8-bit channels) images of dense optical flow using the Farneback estimation algorithm and 8-bit (one-channel) grayscale images using OpenCV (Bradski, 2000).This work also incorporates the image-based side scan sonar simulation by Shin et al. (2022a) to simulate underwater sensing in these virtual environments created in UE ™ .This simulation incorporates visual approximations for acoustic effects such as back-scatter and acoustic shadow to provide a realistic rendering of sonar sensors.Novel modular C++ programmable robot agents are created in UE ™ by interfacing the aforementioned sensors with robot avatars and virtual robots to enable robot perception in the simulation environment.A separate Python script has been created to enable these sensors to be used as static sensors monitoring the environment or to be programmed to move on predefined trajectories for data collection.Illustrative examples for the type of sensing modalities available in the simulation environment are shown in Figure 6.All the real robots (without avatars) sense the real workspace and are equipped with RP LIDAR A2 (laser range scanner), Orbecc Astra RGBD camera, and Time-of-Flight (TOF) sensors.The output of these sensors mounted on the real and virtual robots is communicated over ROS to robot planners (Section 3.5) running onboard for active perception tasks facilitating inter-agent interaction.

Communication between the real and virtual workspace
The RealTHASC testbed hosts communication channels for message passing to facilitate inter-agent communication in and across the real and virtual workspace.An overview of the messagepassing framework is described in Figure 7.An Alienware Aurora R13 system acts as the head substation that harbors communications from the OptiTrack motion capture cameras, hosts the virtual environment in UE ™ , and supports the VR tracking and associated hardware.As a result of the interactions between the agents in the virtual environment, the head substation generates desired waypoints for the robots based on the cooperation strategy employed.The waypoints intended for the robots in W are communicated to their respective base control stations over a local area network (LAN).The planning and control framework running on these stations, described in Section 3.5, converts these waypoints to control commands transmitted to the real robots over 2.4 GHz Wi-Fi networks.These Wi-Fi networks are set up on distinct custom channels to avoid aliasing from simultaneously sent control commands.
The base control stations communicate with each other over a LAN connection using User Datagram Protocols (UDP) to simulate interrobot communication in real time with low latency.Since the robots move in the real workspace, the OptiTrack motion capture cameras stream data packets comprising the marker data to the head substation through a bridge at a frequency of 120 Hz.Motive, a proprietary OptiTrack data processing software, then uses this marker data to infer the localization of user-specified rigid bodies defined as a collection of markers.This inferred robot localization is streamed over UDP channels to real robots through their base control stations for planning purposes and to the robot avatars in U for kinematically coupling them with their real-world counterparts.This kinematic coupling between real robots with XR and their robot avatars is accomplished in real time using the proprietary NatNet SDK to stream rigid body localization data at 120 Hz with a latency of 10 m.The overall system time delay from sending a desired waypoint to detect the corresponding effects in the robot state is approximately 20 m.

Real-time robot planning and control
The ability to readily test and simulate collaborative planning and control algorithms for HAT research is demonstrated by implementing a waypoint-following policy on real and virtual robots in this testbed.For a robot i ∈ R operating in W (real robots and real robots with XR), its base control station receives a waypoint denoted by q p i , i ∈ R in W directly or receives a waypoint in U and maps it to a desired waypoint in W using the transformation T −1 .Subsequently, the robot executes the waypoint-following policy to determine appropriate control commands based on the received waypoint information.For virtual robots, however, the waypoint q p i , i ∈ P is directly used by the policy.For brevity, agent indices are omitted in the remainder of this section.Assuming q p (k)

FIGURE 7
RealTHASC cyber-physical interfaces and communication pipelines enable seamless integration of real agents and applicable avatars in the virtual workspace.
[p p θ p ] T is the desired waypoint for a robot in W at time-step k, the desired position and desired orientation is then p p [x p y p ] T and θ p , respectively.The chosen algorithm for waypoint-following implementation is based on a move-then-turn policy for each robot in the real and virtual workspace.At state q = [x y θ] T , the robot first turns to point towards the desired waypoint position p p , moves towards it, and then rotates to reach the desired orientation θ p .This policy outputs the control command u = [v ω] T comprised of linear velocity v and angular velocity ω, which has been summarized in the order of execution as follows where k x , k y , k θ ∈ R + are user-defined parameters for which larger values represent faster response to errors in robot pose.Simultaneously, the motion capture system continuously tracks the motion of these robots in W and records their states.As mentioned in Section 3.4, this localization information qi , i ∈ R, an estimate of q i , is then streamed to the robot i ∈ R, via its base control station, and to its robot avatar i ∈ P, in UE ™ .The robot avatar is then moved to qi , i ∈ P, which is calculated using the transformation T as shown in Figure 4.This control loop used to couple the real robot with its avatar runs at a frequency of 120 Hz, in real time.The planner continuously streams waypoints with the desired run-rate frequency for various robot tasks, explored further in Section 4, while the control policy outputs the appropriate command for the most recent waypoint.

Experiments
To convey the functionalities and capabilities of the RealTHASC facility, three experiments are conducted that each highlight different types of agent interactions across the real and virtual workspace.The first experiment focuses on human-robot interaction, the second experiment tests multi-robot teaming in and across real and virtual workspaces, and the third experiment showcases human-robot collaboration using synthetic sonar sensor simulation.

Human-robot perception and control
This experiment is designed to showcase interaction-based control of virtual robots and robot avatars using gesture commands from a human teammate.This demonstration takes place in the industrial city environment, built in UE ™ , hosting the following actors: a human avatar, a robot avatar, and a virtual robot.As shown in Figure 3, all agents perceive the virtual workspace using simulated RGB cameras.Human avatars communicate with the robot agents using gestures, as shown in Figure 8, to command the next waypoint.These gestures are detected by a real-time human-pose detection algorithm, OpenPose (Wei et al., 2016;Cao et al., 2017), implemented on each of the robot agents.Three distinct pose commands are communicated to the robot, which then moves in three different directions: left, forward, and right.These predefined commands can also be communicated to the robots as audio cues by using the Google Audio speech-to-text interface (Google LLC, 2022) running on the head substation.The real human with XR may utter any of these three predefined commands into the internal microphone of the VR headset, which is then transcribed to text.Based on the pose commands received as either visual or audio cues, the planner generates desired waypoints in the commanded direction.These waypoints are then streamed to the base control station of the real robot with XR (i.e., the real robot coupled with the robot avatar) over LAN and to the virtual robot in the environment as described in Section 3.4.The desired orientation at each set of waypoints manipulates the robots to face the human avatar in order to perceive the next gesture command.The waypoint-following policy, described in Section 3.5, is then used to calculate the control commands on each of the robot agents to reach their desired waypoint.This experiment is summarized using the schematic in Figure 9 for a virtual robot ,  ∈ P, and a real robot with XR ,  ∈ R ∩ P { }.In this experiment, the virtual robot and the robot avatar are placed alongside each other at a fixed distance and orientation needed to perceive the human avatar in UE ™ .With each pose command, the robot agents move a distance of 0.5 m in the commanded direction.A total of ten pose commands are issued to each agent, and the resulting position changes are plotted in Figure 10.The results show that both the virtual robot and the robot avatar are able to correctly identify and react to all ten predefined gesture commands.Both the virtual robot and the robot avatar successfully traverse the distance with proper heading directions as shown in Figure 10.Comparison between the trajectories of the robot avatar and the virtual robot indicates that the robot avatar is able to successfully incorporate the dynamics of its real-world counterpart and hence exhibits errors induced by realworld physics, such as the effects of friction and slip, unlike the virtual robot.This experiment demonstrates how the RealTHASC facility is able to successfully simulate proximate visual interactions and incorporate real-world dynamics while providing a safe medium for human-robot collaboration.

Multi-robot interaction for formation control
The second experiment is designed to illustrate closed-loop interaction and control between multiple robot agents listed in Figure 1: a virtual robot, a real robot with XR sensing in U, and a real robot sensing in W. The purpose of this experiment is to demonstrate that the facility is able to bridge the gap of data transfer between agents existing in simulation and the real world.In this experiment, the robot team is tasked with a leader-follower-based formation control objective.A virtual robot and robot avatar are placed into the industrial city environment created in UE ™ while a real robot and a real robot with XR (i.e., real robot coupled with the robot avatar) operate in the physical lab workspace.The virtual robot is designated as the leader robot which independently moves along a pre-specified path.The real robot with XR determines its waypoints using the localization information of the virtual robot, as communicated to its avatar, while the real robot does so, in turn, by using the localization of the real robot with XR.A formation control policy is implemented onboard each robot agent to calculate the

FIGURE 8
First-person perspective from a virtual autonomous UGV inside the industrial city, equipped with a virtual RGB camera and implementing OpenPose for keypoint detection (top row).The robot avatar is able to recognize and interpret the manual commands provided by a real human in W (bottom row), namely, (A) left, (B) right, and (C) forward, by virtue of the human avatar created in real time using VR body tracking.

FIGURE 9
Human-robot collaboration is achieved by a human avatar, teleoperated by a real human with XR, commanding heading directions to the virtual robot  and robot avatar  operating in U, using pose commands generated by the keypoint detection as shown in Figure 8.

Frontiers in Virtual Reality
frontiersin.org 10 desired waypoints and ensure that the robot team maintains a desired formation.
The formation control experiment is illustrated in Figure 11, which features a multi-robot team comprising of a virtual robot ,  ∈ P, a real robot with XR ,  ∈ R ∩ P { }, and a real robot ℓ, ℓ ∈ R. The virtual robot in the role of a leader moves along an offline, elliptical trajectory.The state of the leader is streamed by means of a socket connection to the base control stations of the real robot with XR in W.This base control station calculates the desired waypoint based on the state of the leader in real time to maintain an isosceles triangle formation.Simultaneously, the state of this real robot with XR is also streamed to the base control station of the real robot using socket programming through the inter-robot LAN connection, which calculates the desired waypoint for this robot to maintain the formation.This allows for decentralized formation control of a multi-robot team in and across W and U.All robot agents use the policy defined in Section 3.5 to reach the desired waypoints obtained online.It is important to note that in this experiment, only the state of real robot with XR is streamed back to the virtual environment since it is the only robot with a virtual avatar.
The trajectories of the robot agents are plotted in Figure 12A.
The virtual robot in UE ™ follows the elliptical trajectory as designed, and the successful coupling between the real robot with XR and its virtual avatar can be observed.The robot team maintains the desired isosceles triangle formation throughout the experiment as illustrated in various instances in Figure 12A.
Since the path of the leader and the desired formation have been predefined, the desired trajectories for all the robots are determined offline and the error between their positions during the experiment and these trajectories are recorded.High positional accuracy is achieved by both the agents as the largest positional error is within 0.10 m as shown in Figure 12B.This performance plot also shows that the second follower (real robot) consistently experiences lower positional accuracy when

FIGURE 10
Trajectory results of the pose estimation experiment for human-robot interaction.A total of 10 pose commands are given to both the virtual robot and the robot avatar.

FIGURE 11
Leveraging the communication pipeline of RealTHASC, a virtual robot , real robot with XR  with its avatar, and a real robot ℓ co-ordinate amongst themselves to maintain the isosceles triangle formation.
compared to the first follower (real robot with XR).This is attributed to the aggregation of errors due to the decentralized nature of coordination amongst the agents.These results successfully demonstrate the capability of this testbed to establish communication between various agents existing in the real and virtual workspace and, as a result, enable realtime interaction between real and simulated agents.

Human-robot collaboration in underwater multi-target classification
This section aims to showcase a probable extension and direct application of the RealTHASC framework to intelligent vehicle systems commonly used in industrial and defense sectors, such as AUVs.Human operators collaborating with AUVs can help integrate their expertise, domain knowledge, and situational awareness, thus making these systems more robust, adaptive, and efficient.While testing underwater human-robot collaboration in the development phase can prove to be resource-intensive, testing underwater perception capability completely in simulation leads to inaccuracies when the simulation fails to incorporate environmental factors.Thus, the aim of this experiment is to augment the capabilities of the RealTHASC facility for testing collaborative multi-target classification in human-robot teams to overcome these challenges.
The undersea virtual environment hosted by RealTHASC consists of various seabed conditions as shown in Shin et al. (2022a).In this experiment, three different seabed conditions-namely sand ripples, mud, and rocks-are implemented.These seabed conditions are acquired from the UE ™ Marketplace and modified using the Sculpt mode in UE ™ editor to replicate the environmental conditions of the operation site.The synthetic sonar images are generated by setting a camera actor defined in UE ™ to face downwards and adding a directional light from the side to mimic acoustic highlights and shadow patterns based on the orientation of the vehicle equipped with imaging sonar.Once the camera actor renders RGB images, the rendered images are then post-processed to convert RGB values into intensity values and to add realistic acoustic noises.This post-processing is conducted using MATLAB, which converts the input RGB image into a grayscale image and adds either Gaussian, speckle, or Poisson noise.An example of both an output RGB image from UE ™ and post-processed synthetic sonar images are presented in Figure 13.
The synthetic images generated from this photorealistic simulation are then used to train an automatic target recognition (ATR) algorithm to classify targets in the images.In this experiment, three types of objects are used: cylinders, cubes, and spheres.A total of 1850 synthetic images are generated for the geometric targets: 650 for the cylinder, 600 for the cube, and 600 for the sphere.These images are generated with each type of object from various aspect angles to train acoustic highlight-shadow patterns, and a speckle noise with a variance of 0.1 is used in the post-processing.The ATR algorithm presented in Zhu et al. (2017), which uses a pretrained AlexNet to extract feature vectors followed by a support vector machine (SVM) that is trained to perform the classification, is implemented in this experiment.A total of 80 images are used for training to avoid overfitting, and the remaining synthetic images are used for testing.Specifically, 21 images of a cylinder object, 27 images of a cube object, and 32 images of a sphere object are used for training the ATR algorithm.As a side note, the transfer learning performance of this ATR approach has been tested by images generated from a high-fidelity physics-based sonar simulation (Sammelmann et al., 1997) and presented in Shin et al. (2022a).
Underwater multi-target classification algorithms require multiple sonar images taken from different views to achieve a satisfactory confidence level before each object's classification is declared (Chang et al., 2018).This classification confidence level is updated based on a sensor model represented in a Bayesian network whenever a new sonar image of each object is obtained.Leveraging RealTHASC alongside the aforementioned ATR approach, a human-robot collaboration experiment is proposed as a probable extension that may be used to test the  2022b).The experiment is set up such that the AUV equipped with a side-scan sonar sensor in the virtual environment uses this target classification approach while constantly communicating with a remote human operator.The human operator is notified when the CL is higher than a user-chosen threshold (ε CL ), which is designed to be lower than the threshold to declare target classification.This setting allows the human operator to revise the sonar images and corresponding output from the ATR algorithm before the system declares a wrong classification.Moreover, the human operator does not need to go over every false alarm.The framework of the proposed experiment is illustrated in Figure 14.This demonstrates how the RealTHASC facility can also be used for human-robot interaction applications in remote environments.

Conclusion
This paper presents RealTHASC, a multimodal cyberphysical XR facility that leverages state-of-the-art robotics, visualization tools, motion capture, and virtual reality technology to enable a novel experimental testbed interfacing physical and virtual worlds.Unreal Engine ™ is used to create photorealistic simulated environments which facilitate interactions amongst human-autonomy teams (HATs), comprising of real agents, virtual agents, and agent avatars, tasked with achieving various objectives.These agent avatars operating in the virtual environment are teleoperated by the  real agents with XR operating in a physical environment, thus sharing real-world dynamics, while the avatars grant simulated perception for planning and decision-making.Communication pipelines enable seamless interfacing of the real and virtual workspace in order to enable real-time collaboration amongst various agents in the HATs.The results of the three experiments demonstrate the capability of this system's framework to effectively host highly flexible environments with interactive agents spanning a combination of both the real and virtual worlds.The first experiment focuses on establishing humanrobot perception and effectively demonstrates closed-loop control of both virtual robots and real robots with coupled virtual avatars.Using body gestures or voice commands, the human operators effectively communicate commands with robot agents and control the trajectory of each agent in real time.With perception and control successfully established between the robots and humans in this testbed, the second experiment demonstrates that the developed facility is able to establish decentralized communication between varying robot agents.By implementing a leader-follower and formation control scenario on robot teams, this experiment effectively conveys the modality of RealTHASC to host real-time communication between the simulated and physical worlds and extends its reach to be used for multi-robot experiments.Finally, the third experiment shows the highly programmable nature of the sensors and virtual environments supported by the facility and their compatibility with the proposed human-AUV collaboration framework.Apart from such pre-deployment testing of collaboration algorithms, the RealTHASC facility can also be used as a closed-loop interaction interface to facilitate downstream tasks, such as online learning and data collection, for safety-critical applications including social navigation.Future work will extend the capabilities of this facility to include new interfaces for human operators such as haptic feedback devices and will leverage this facility to study 1) AI-supported teamwork in collaborative virtual environments, 2) decentralized AIsupported multi-agent planning and perception, 3) integration of emerging neuromorphic and insect-scale technologies, and 4) distributed sensing and control for very-large networks of agents.

FIGURE 2
FIGURE 2Configurations of a human and robot agent in the real workspace (left) and their corresponding avatars in the virtual workspace (right).

FIGURE 3
FIGURE 3Visual perception of industrial city virtual environment in which the avatars of (A) human and (B) robot agents interact in real time.

FIGURE 5
FIGURE 5Virtual worlds, inclusive of autonomous robots, sensors, and artificial intelligence algorithms, are developed exclusively for RealTHASC using UE4™.Examples of these synthetic environments include (A) an industrial city monitored by an AAV equipped with onboard RGBD camera and (B) an underwater environment scanned by an AUV equipped with a side-scan sonar.© 2022 IEEE.Reprinted, with permission, fromShin et al. (2022a).

FIGURE 6
FIGURE 6 Examples of simulated sensing modalities and computer vision algorithms in RealTHASC.© 2022 IEEE.Reprinted, with permission, from Shin et al. (2022a).
FIGURE 12 (A) Trajectory results of the leader-follower formation control experiment.The virtual robot is the leader, the real robot with XR is the first follower, and the real robot is the second follower.(B) The position error of each robot follower over the duration of the experiment.

FIGURE 13
FIGURE 13 Synthetic sonar images taken from the simulation model presented in Shin et al., 2022a taken (A) before and (B) after post-processing with Gaussian white noise with mean 0 and variance of 0.05.© 2022 IEEE.Reprinted, with permission, from Shin et al. (2022a).

FIGURE 14 Frontiers
FIGURE 14Example demonstrating human collaborating with an AUV equipped with automatic target recognition (ATR) algorithms and Bayesian network used for multi-target classification from images generated by the synthetic sonar simulator.© 2022 IEEE.Reprinted, with permission, fromShin et al. (2022a).

TABLE 1
Description of the hardware and software components used to build the RealTHASC facility.
Hardware 10x OptiTrack Prime x 22 Camera Multi-camera network localizes and tracks physical robots using placed markers 2x ROSbot 2.0 UGV for the physical robot experiments Meta Quest 2 VR platform allows human users to see, hear, and interact with virtual worlds Dell Alienware Aurora R13 Primary desktop computer acts as the head substation hosting the simulation environment and connections from the laboratory environment 2x Alienware x15 R2 Gaming Laptop Base control station laptop that receives waypoint (x, y, θ) from the desktop and communicates control command to the ROSbot