Multi-Sensory Urban Search-and-Rescue Robotics: Improving the Operator’s Omni-Directional Perception

de Barros, Paulo G.; Lindeman, Robert W.

doi:10.3389/frobt.2014.00014

ORIGINAL RESEARCH article

Front. Robot. AI, 02 December 2014

Sec. Virtual Environments

Volume 1 - 2014 | https://doi.org/10.3389/frobt.2014.00014

Multi-sensory urban search-and-rescue robotics: improving the operator’s omni-directional perception

Paulo G. de Barros

Robert W. Lindeman*

Human Interaction in Virtual Environments Laboratory, Department of Computer Science, Worcester Polytechnic Institute, Worcester, MA, USA

The area of human–robot interaction deals with problems not only related to robots interacting with human beings but also with problems related to human beings interacting and controlling robots. This article focuses on the latter and evaluates multi-sensory (vision, hearing, touch, and smell) feedback interfaces as a means to improve robot-operator cognition and performance. The paper summarizes three-previously reported empirical studies on multi-sensory feedback using simulated robots. It also reports the results of a new study that used a physical robot to validate the results of these previous abovementioned studies, and evaluate the merits and flaws of a multi-sensory interface as its sensorial complexity was gradually increased. The human senses were selected based on their response time to feedback and easiness of adaptability of their feedback mechanisms to different types of robot-sensed data. The results show that, if well-designed, multi-sensory feedback interfaces can indeed improve the robot-operator data perception and performance. They shed some light on the benefits and challenges multi-sensory feedback interfaces bring, specifically on teleoperated robotics and urban search-and-rescue. It adds to our current understanding of these kinds of interfaces and provides a few insights to assist the continuation of research in the area.

Introduction

Human beings perform tasks effectively in the real world using their highly advanced senses. Through evolution and repetition, they are able to effortlessly take in, filter, fuse, and make sense of huge amounts of high-fidelity visual, auditory, touch, smell, and taste stimuli. Furthermore, due to their versatile nature, human beings are able to adapt to input/output (I/O) mechanisms when using tools and machines, even if interfaces are sub-optimally designed.

While robotic systems are assuming an ever-increasing role in our lives, current human–robot interaction (HRI) interfaces for teleoperated robotic systems seldom take advantage of the high-bandwidth, multi-sensory capacity offered by human operators. Instead, they present all information to the eyes alone using visual displays. Although our visual sensory system is highly evolved, its capacity is not limitless, and its overuse may demand excessive mental effort from the robot operator and restrict his ability to efficiently and effectively perform the tasks he has been assigned.

The reasons for the predominance of visual-only HRI interfaces include (a) the ease with which information can be displayed on computer monitors, (b) a lack of understanding within the interface design community of the salient aspects of displays for other sensory modalities, (c) a lack of methods for evaluating multi-sensory interface effectiveness, and (d) interface cost.

As an attempt to add to the abovementioned knowledge gaps, this article presents and discusses the results of four user studies involving multi-sensory feedback interfaces in the performance of an urban search-and-rescue (USAR) robot teleoperation task (de Barros et al., 2011; de Barros and Lindeman, 2012, 2013; de Barros, 2014). In these studies, virtual and real robots were used and the vision, hearing, touch, and smell senses were exposed to feedback from the robot interface.

The results obtained confirm the effectiveness of multi-sensory interfaces in off-loading visual information to other senses and improving the user’s spatial perception and task performance. Although the task and visual interface used in the studies are USAR-specific, the benefits obtained by the use of multi-sensory interfaces could be extended to other types of robotic and computer systems in general. Additionally, the evaluation methodology that evolved along these studies brings together separate but related metrics from the virtual reality (VR), HRI, and human–computer interaction (HCI) communities and is proposed as a starting point for future evaluations of this kind of interface.

Related Work

Most urban search-and-rescue (USAR) robot interfaces nowadays display all data visually. Nevertheless, there has been an evolution in their design over the course of the past decades. Such evolution can be simplistically divided into three stages or eras:

(1) Mono-out pre-fusion era (up to 2004): data are spread across a single visual display in multiple windows that could potentially overlap (Yanco et al., 2004). Only a few attempts were made to fuse information into a single display (Johnson et al., 2003).

(2) Mono-out fusion era (2005–2009): data are presented on a single window with multiple overlapping panels (Yanco et al., 2007). The fusion makes the overlapping intuitive and non-obtrusive, and facilitates the perception to such data (Nielsen et al., 2007).

(3) Mono-in mono-out fusion era (2010 to present): not only output is fused in this era but also input, whose interactions are done within the visual display through touch. Because the input area is closer to the user’s visual point of focus, it can be handled or disambiguated more effectively and efficiently (Micire et al., 2011).

Although much as these interfaces have improved, little effort has been put into using more than one sense for both input and output. This is the motivation of this article: to push USAR robot interfaces to the next era of Multi-in Multi-out data fusion, where I/O is fused, uses multiple senses and leads to transparent and intuitive system interactions. The focus of our current research work is not on input, but rather on output. Future work looks at the input side.

Multi-Sensory Feedback Techniques

Visual-feedback techniques generally involve LCD or CRT monitors for displaying data to the operator. But what and how data are displayed varies for each application. Examples of display techniques are 3D mapping (Thrun et al., 2004), stereo and probabilistic vision (Zelek and Asmar, 2003), and point clouds (Suarez and Murphy, 2012).

Audio feedback can be used to display robot data in analog (e.g., direct sound stream) or symbolic (e.g., speech synthesis and sound icons) forms (Gröhn et al., 2005). It has been shown that its use can improve realism of virtual scenes (Blom and Beckhaus, 2010), user situation awareness (SA) (Kaber et al., 2006), search (Gröhn et al., 2005), and remote vehicle-control performance (Nehme and Cummings, 2006).

Touch feedback can be divided into kinesthetic and tactile feedback. The focus of this work is on the latter because this interface is often less cumbersome, easier to deploy in field applications, such as USAR, and more easily re-mapped to different robot-sensed data. Tactile cues have been used as display devices on various parts of the body such as the forehead, tongue, palms, wrist, elbows, chest, abdomen, back, thighs, knees, and foot sole (Lindeman, 2003; Zelek and Asmar, 2003). Vibro-tactile feedback has been associated with improved reaction (Van Erp and Van Veen, 2004) and completion time (Lindeman et al., 2005), task effectiveness, and useful for providing directional cues (Arrabito et al., 2009), alerts (Elliott et al., 2009), and 3D information (Bloomfield and Badler, 2007).

Olfactory (smell) feedback has been explored in VR and different technologies have been devised for providing it to users. The most common ones are projection-based devices using wind (Noguchi et al., 2009), air puffs (Yanagida et al., 2004), or close-to-nose tube-delivery devices (Narumi et al., 2011). Effects of smell on human cognition and performance have also been measured in the past (Moss et al., 2003; Herz, 2009). No research has been found that applies smell feedback as an aid to robot teleoperation tasks.

For palatal (taste) feedback, researchers have come up with different devices for displaying taste (Narumi et al., 2011) or the sensation of eating (Iwata et al., 2004) and drinking (Hashimoto et al., 2006). Although not explored in this work, the sense of taste could be associated with chemical or thermal temperature data collected from air or soil from a remote robot and aid in route planning or data resampling decisions.

Even though a large amount of research has been done on evaluating these types of feedback individually, few have evaluated the consonant use of more than two senses for feedback, especially in the area of robot teleoperation. The studies presented in this work evaluate the effect of multi-sensory feedback with virtual and real robots in a USAR task scenario.

User Studies

Four multi-sensory feedback studies are presented in this section. The first three studies use a simulated robot while the fourth one uses a physical one. The task subjects are asked to perform are the same in all studies, which is to search for red objects (circles or spheres) in a debris-filled environment. Subjects were asked to find as many object as possible, as fast as possible, while trying to avoid collisions with the robot as much as possible. Subjects were unaware of the total number of objects hidden. In the context of the AAAI Rescue Robotics Competition, the environments for both simulated and real robots are rated as the level yellow of the competition, where the robot traverses the entire world by moving around the same ground level with some debris spread across the floor (Jacoff et al., 2003).

Even though the amount of treatments subjects were exposed to varied according to the study design (between versus within-subjects), for each treatment, the experimental procedures were the same and can be summarized by the following seven steps:

1. A Institutional Review Board (IRB) approved consent form was read and filled-in.

2. Instructions were given about the robot, and the task to be completed.

3. The robot interface would be explained, followed by a short training session that was accompanied by Q&A.

4. The subject would take part in the task for a specific treatment interface.

5. A post-treatment questionnaire would be filled-in.

6. If the study had a within-subjects-design, steps 3 through 5 would be repeated for the subject for each remaining treatment.

7. A final post-study questionnaire would be filled-in.

In all studies, a post-treatment questionnaire asked subjects to report the number of spheres found and their location by sketching a map of the environment. They were provided with the pictures taken with the robot camera during their traversal of the environment to help them in sketching. The pictures were presented with a resolution of 800 × 640 pixels on a Web page during the sketching task.

The first study (de Barros et al., 2011) compared the display of robot collision-proximity data through visual and/or vibro-tactile displays. The second study (de Barros and Lindeman, 2012) explored the pros and cons of two vibro-tactile data display representations. The third study (de Barros and Lindeman, 2013) further enhanced the visual-tactile interface from study #2 with audio and redundant visual feedback, and measured the effects of such enhancements to the interface. The fourth and last study (de Barros, 2014) attempted not only to validate previous results obtained via simulation with a real robot but also evaluated the addition of smell feedback on top of the other three-previously evaluated types of sensory feedback.

Robot Interface

All studies had common features in terms of interface feedback. These common features are detailed in this section. The enhancements performed on this interface by each study are detailed in the section related to each study.

The visual interface design used as a starting point the interface proposed by Nielsen et al. (2007). The operator was presented with a third-person view of a 3D virtual representation of the robot, called its avatar. The virtual robot and its avatar had the approximate size of a standard search robot (0.51 m × 0.46 m × 0.25 m). Data collected by the robot sensors were visually presented, including a video feed from a pan-tilt camera mounted on the robot, and sensor data, such as location of object surfaces near the robot, collision locations around the robot, and carbon monoxide (CO) levels in the air. Depending on the experiment, such data could originate from a virtual or real remote environment. The visual interface was viewed through a standard LCD screen in a window with resolution of 1024 × 768.

The robot camera had a field-of-view of 60°. A panel located in front of the robot avatar presented data from this camera. The camera, and hence the panel, could be rotated about both the vertical and horizontal axes relative to the front of the robot. The camera-panel rotations occurred relative to the robot avatar and matched the remote robot camera rotations controlled by operator input.

For the first three studies, a map blueprint of the environment was gradually projected on the ground in the form of blue lines as the robot explored the environment. These blue lines represented the locations of object surfaces near the robot as detected by the robot sensors. In all experiments, a timer was presented in the top right hand corner of the screen. It was triggered once the training session finished and the study task was started.

The belt used for providing vibro-tactile feedback, the TactaBelt (Figure 1A, Lindeman, 2003) was also the same one in all studies. The TactaBelt consisted of eight pager motors, also called tactors, arranged in a ring around the robot-operator’s torso. The motors were spaced evenly and the forward direction was represented by the motor in the torso front. All subjects wore the TactaBelt, even if the interface was not active during the experiment for some of them.

FIGURE 1

Figure 1. Hardware interface used in addition to a standard LCD monitor: (A) TactaBelt, and (B) PlayStation 2 dual-shock controller.

Additionally, the virtual and physical robots were controlled using a PlayStation 2 gamepad (Figure 1B). The virtual and physical robots rotation used differential drive, which meant the robot could rotate in place or while in movement. The gamepad could also be used to take pictures using the robot camera.

In all studies, subjects were asked to sketch a map of the environment when the search task was completed. The map had to indicate the location of the objects found. These maps were scored from 1 (poor) to 5 (excellent) using the evaluation criteria similar to Billinghurst and Weghorst (1995).

Data Variables and Analysis

The main dependent variables (DVs) used in these studies to determine the impact of interfaces in terms of performance and SA were the number of robot collisions (local SA impact), the time taken to perform the task (performance impact), an increase in the number of objects found (performance impact), and a better reporting of the location of the objects and understanding of the environment (global SA impact). SA (Endsley and Garland, 2000) is interpreted in this research work as the user’s awareness of a subset of the current state of the robotic system, and its surrounding local and remote environment, which is relevant to the task at hand. Other variables related to subjects health and workload were also gradually added as the methodology evolved along the studies. These will be described in the sections summarizing each study.

The demographics information was collected in questionnaire form. It initially asked about subject gender, age, how often they played video games and used, or worked with robots among other questions, but further information was collected as the studies progressed and the study methodology evolved. For experience-related questions, such as the last two mentioned above, a numerical scale of four values was used as follows: “daily” (1), “weekly” (2), “seldom” (3), or “never” (4).

Subjects also took a spatial aptitude test in studies #2, #3, and #4 to ensure results were not biased by subjects’ spatial abilities.

The results for all four studies were analyzed using a single-factor ANOVA with confidence level of α = 0.05 over the interface treatments presented in each study. Results close to significance had a confidence level of α = 0.1 and were described as trends. When a statistically significant difference (SSD) among more than two interface treatments was found, a Tukey test (HSD, 95% confidence level) was performed to reveal the groups that differed from each other. In some cases, single-factor ANOVAs were also applied to compare groups in a pair-wise fashion. For questionnaire ratings, Friedman tests compared all groups together, while Wilcoxon tests compared them pair-wise. If a dependent variable (DV) is not mentioned in the data analysis of a study, it means that it did not lead to SSDs among independent variable (IV) groups. Partial eta-squared (η²) results were also calculated using group or pair-wise ANOVAs.

If the study had a between-subjects design, independent-samples ANOVAs were used. If the study had a within-subjects design (studies #2 and #3), repeated-measures ANOVAs were used and data normalization across interface treatments was performed on a per-subject basis to reduce the amount of data variation due to different levels of subject experience. An example of such per-subject normalization is the following. If subject A, for a DV X, had the following results (Trial 1, Trial 2, Trial 3) = (10, 20, 30), these values would be converted into (10/60, 20/60, 30/60) ~ (0.166, 0.334, 0.5). In the within-subject studies, treatments and scenario order was partially balanced using Latin square.

More details on the data collection, data analysis, equipment, and materials preparation for each of the studies can be found in de Barros (2014).

Study 1: Evaluating Visual and Vibro-Tactile Feedback

This first study aimed at evaluating the impact on SA and performance when part of the data transmitted by the robot was displayed through a body-worn vibro-tactile display (TactaBelt) used to display imminent robot collisions. The use of the vibro-tactile feedback for robot collision proximity was compared with the use of no feedback, the use of visual feedback, and the use of both types of feedback in a search task (de Barros et al., 2011).

Robot interface

In order to compare visual and vibro-tactile feedback for collision-proximity feedback (CPF), the interface design (Figure 2) for study #1 had a ring surrounding the robot avatar. This ring indicated imminent collisions near the robot, similar to the Sensory EgoSphere proposed by Johnson et al. (2003). The brighter the red color in the ring, the closer to a collision point the robot was. The same type of feedback was also provided as vibration through the TactaBelt. The more intense a tactor in the TactaBelt vibrated, the closer the robot was to colliding in that direction, similar to the feedback technique proposed by Cassinelli et al. (2006). Both visual and vibro-tactile feedback interfaces were only activated when an object was within a distance d from the robot (d ≤ 1.25 m).

FIGURE 2

Figure 2. Study #1 visual interface components.

Hypotheses

Previous results obtained from other research groups have shown improvement in performance when using vibro-tactile displays (Bloomfield and Badler, 2007; Blom and Beckhaus, 2010) and enhanced interfaces (Johnson et al., 2003). Based on these results, study #1 claimed that

H1.1. Subjects using either the vibro-tactile or the graphical ring feedback interface should have an increase in navigational performance and SA.

H2.2. Subjects who are using both the vibro-tactile and the graphical ring feedback interfaces should have an even larger increase in navigational performance and SA.

Methodology

This user study had a between-subjects design. The IV was the type of CPF interface, which divided subjects into four groups or treatments: the first group (“None”) operated the robot without using any CPF interface. The second (“Ring”) received this feedback from the graphical ring. The third (“Vibro-tactile”) received this feedback from the TactaBelt. The fourth (“Both”) received the CPF feedback from both the graphical ring and TactaBelt. A virtual training room (15 m × 15 m) and the room where the real task took place (8 m × 10 m) are presented in Figures 3A and 3B, respectively. In the real task room, objects such as doorways, barrels, and tables where represented in their size in reality.

FIGURE 3

Figure 3. Environments used during the training session (A) and real experiment (B) for study #1.

Results

A total of 13 female and 14 male university students participated in the study (age: M = 20 years and 6 months, SD = 5 years and 3 months). The results with SSD are presented in Table 1. The black lines mark groups of interfaces with statistically equal results. If no line is shown, all results were statistically equal. Mean values with a “^•” or “*”s detail the SSD magnitude among interface treatments.

TABLE 1

Table 1. DV results for different interface treatments in study #1.

A comparison of the number of collisions between groups showed SSDs between groups (“None,” “Ring”), F(1, 11) = 6.69, p = 0.02, η² = 0.378, and (“Ring,” “Vibro-tactile”), F(1, 11) = 5.08, p = 0.04, η² = 0.462. The “Ring” interface led to a higher number of collisions than the treatment interface. For the number of spheres found per minute, a SSD indicated a lower number of spheres found for group “Ring” compared to group “Both,” F(1, 11) = 11.17, p = 0.006, η² = 0.504. These differences did not occur for neither of the two treatments including vibro-tactile feedback (“Both” and “Vibro-tactile”). When comparing map quality with the type of CPF interface used, a SSD was found between groups “None” and “Both,” F(1, 12) = 5.65, p = 0.03, η² = 0.32. A trend toward significance was also found between groups “Vibro-tactile” and “Both,” F(1, 12) = 4.08, p = 0.07 η² = 0.254. Although the results could not confirm neither of its hypothesis, they appear to show that when used together, the CPF interfaces may have helped improve the robot-operator global SA.

Conclusion

This study has shown that the use of redundant multi-sensory feedback, specifically visual and vibro-tactile feedback, can be beneficial to the robot operator when either type of feedback is insufficient to bring the operator to his optimal level of performance and SA. In other words, one type of feedback can help minimize the other’s deficiencies and bring about a better HRI feedback interface. Nevertheless, it is still unclear how the form with which data are displayed through a specific sense impacts subject performance and SA. Study #2 provides an initial investigation on this topic for the vibro-tactile type of feedback.

Study 2: Evaluating Vibro-Tactile Display Techniques

The first study compared visual and vibro-tactile data displays, both of which outperformed the control case, but only when they were presented together, not when presented separately. This second study attempts to reassess the result obtained by the vibro-tactile-only interface by exploring different vibro-tactile modes without the interference of the graphical ring (de Barros and Lindeman, 2012).