Supervised Autonomy for Exploration and Mobile Manipulation in Rough Terrain with a Centaur-Like Robot

Planetary exploration scenarios illustrate the need for autonomous robots that are capable to operate in unknown environments without direct human interaction. At the DARPA Robotics Challenge, we demonstrated that our Centaur-like mobile manipulation robot Momaro can solve complex tasks when teleoperated. Motivated by the DLR SpaceBot Cup 2015, where robots should explore a Mars-like environment, find and transport objects, take a soil sample, and perform assembly tasks, we developed autonomous capabilities for Momaro. Our robot perceives and maps previously unknown, uneven terrain using a 3D laser scanner. Based on the generated height map, we assess drivability, plan navigation paths, and execute them using the omnidirectional drive. Using its four legs, the robot adapts to the slope of the terrain. Momaro perceives objects with cameras, estimates their pose, and manipulates them with its two arms autonomously. For specifying missions, monitoring mission progress, on-the-fly reconfiguration, and teleoperation, we developed a ground station with suitable operator interfaces. To handle network communication interruptions and latencies between robot and ground station, we implemented a robust network layer for the ROS middleware. With the developed system, our team NimbRo Explorer solved all tasks of the DLR SpaceBot Camp 2015. We also discuss the lessons learned from this demonstration.


INTRODUCTION
In planetary exploration scenarios, robots are needed that are capable to autonomously operate in unknown 20 environments and highly unstructured and unpredictable situations. Since human workers cannot be 21 deployed due to economic or safety constraints, autonomous robots have to robustly solve complex tasks  Momaro's actuators, we refer to (Schwarz et al., 2016). 163 Using similar actuators for every DOF simplifies maintenance and repairs. E.g. at the SpaceBot Camp shoulder using a knee actuator, since the knees were hardly used in this demonstration. Fortunately we 166 acquired a spare actuator in time. Details can be found in Section 11.  Momaro is relatively lightweight (58 kg) and compact (base footprint 80 cm×70 cm). During development 179 and deployment, this is a strong advantage over heavier robots, which require large crews and special 180 equipment to transport and operate. In contrast, Momaro can be carried by two people. In addition, it can 181 be transported in standard suitecases by detaching the legs and torso.  are in teleoperated scenarios (Schwarz et al., 2016), it also carries seven color cameras-three panoramic 190 cameras and one downward-facing wide-angle camera mounted on the head, one camera mounted in each 191 hand, and one wide-angle camera below the base. In a supervised autonomy scenario, these cameras are 192 mainly used for monitoring of the autonomous operation.  To support the multitude of robots and applications in our group 2 , we have a set of common modules,

207
implemented as Git repositories. These modules (blue and green in Fig. 4) are used across projects as 208 needed. On top of the shared modules, we have a repository for the specific application (e.g. DLR SpaceBot

209
Camp 2015, yellow in Fig. 4), containing all configuration and code required exclusively by this application.

210
The collection of repositories is managed by the wstool ROS utility.

211
Protection against unintended regressions during the development process is best gained through unit 212 tests. The project-specific code is hard to test, though, since it is very volatile on one hand, and testing 213 would often require full-scale integration tests using a simulator. This kind of integration tests have not been 214 developed yet. In contrast, the core modules are very stable and can be augmented easily with unit tests.

215
Unit tests in all repositories are executed nightly on a Jenkins server, which builds the entire workspace 216 from scratch, gathers any compilation errors and warnings, and reports test results.

MAPPING AND LOCALIZATION
For autonomous navigation during a mission, our system continuously builds a map of the environment 218 and localizes within this map. To this end, 3D scans of the environment are aggregated in a robot-centric 219 local multiresolution map. The 6D sensor motion is estimated by registering the 3D scan to the map using 220 our efficient surfel-based registration method (Droeschel et al., 2014a). In order to obtain an allocentric

225
Before assembling 3D point clouds from measurements of the 2D laser scanner, we filter out so-called 226 jump edges. Jump edges arise at transitions between two objects and result in spurious measurements.

227
These measurements can be detected by comparing the angle between neighboring measurements and are 228 removed from the raw measurements of the laser scanner. The remaining measurements are then assembled 229 to a 3D point cloud after a full rotation of the scanner. During assembly, raw measurements are undistorted 230 to account for motion of the sensor during rotation. 231 We estimate the motion of the robot during a full rotation of the sensor from wheel odometry and       After a full rotation of the laser, the newly acquired 3D scan is registered to the so far accumulated 249 map to compensate for drift of the estimated motion. For aligning a 3D scan to the map, we use our  In addition to edges between the previous node and the current node, we add spatial constraints between 268 close-by nodes in the graph that are not in temporal sequence. By adding edges between close-by nodes in    Since caves and other overhanging structures are the exception on most planetary surfaces, the 2.5D 303 height map generated in Section 5.5 suffices for autonomous navigation planning.

304
The 2.5D height map H is transformed into a multi scale height difference map. For each cell (x, y) in 305 the horizontal plane, we calculate local height differences D l at multiple scales l. We compute D l (x, y) as 306 the maximum difference to the center cell (x, y) in a local l-window:

311
This is a provisional file, not the final typeset article

312
During the SpaceBot Camp, we used the standard ROS navfn 3 planner. Afterwards, we replaced it with 313 a custom A* planner to consider gradual costs fully, which the ROS planner was not designed to do. We 314 transform the height difference map into a cost map that can be used for path planning.
The λ 1 , λ 3 , and λ 6 parameter values for drivability computation were empirically determined as 2.2, 3.6, 321 and 2.5 respectively. local averages to produce costs D D that increase gradually close to obstacles: (4) Figure 6 shows a planned path on the height map acquired during our mission at the SpaceBot Camp. The found global path needs to be executed on a local scale. To this end, we use the standard ROS 333 dwa local planner 4 package, which is based on the Dynamic Window Approach (Fox et al. (1997)).

334
The dwa local planner accounts for the robot foot print, so cost inflation is not needed.

335
In order to prevent oscillations due to imperfect execution of the planned trajectories, we made some 336 modifications to the planner. The dwa local planner plans trajectories to reach the given goal pose 337 (x, y, θ) first in 2D (x, y) and then rotates in-place to reach θ (this is called "latching" behavior). Separate 338 cartesian and angular tolerances determine when the planner starts turning and when it reports navigation 339 success. We modified the planner to keep the current "latching" state even when a new global plan is received (every 4 s), as long as the goal pose does not change significantly. We also wrote a simple custom recovery behavior that first warns the operator crew that the robot is stuck and then executes a fixed driving 342 primitive after a timeout. is first transformed into the local velocity at each wheel i: where r (i) is the current position of wheel i relative to the base. The kinematic velocity componentṙ (i)

351
allows simultaneous leg movement while driving. The wheels rotates to yaw angle first and then moves with the velocity ||(v To prevent the robot from pitching over on the high-incline areas in the arena, we implemented a pitch 357 control mechanism. The pitch angle of the robot is continuously measured using the IMU. We then Since the incline is directly measured, K s = 1 and K sz = 1. We found K p = 0.8 to sufficiently stabilize

OBJECT PERCEPTION
For approaching objects and adapting motion primitives to detected objects, RGB images and RGB-D 368 point clouds from the wide-angle camera and ASUS Xtion camera, mounted on the sensor head are used. 369 We differentiate between object detection (i.e. determining an approximate object position) and object  When approaching an object, object detection is initially performed with the downwards-facing wide-377 angle camera mounted on the sensor head (see Fig. 7). Using the connected component algorithm, we 378 obtain object candidate clusters of same-colored pixels. An approximate pinhole camera model calculates the view ray for each cluster. Finally, the object position is approximated by the intersection of the view 380 ray with the local ground plane. The calculated object position is precise enough to allow approaching the 381 object until it is in the range of other sensors.

382
As soon as the object is in range of the head-mounted ASUS Xtion camera, the connected component 383 algorithm can also take Cartesian distance into account. We use the PCL implementation of the connected 384 component algorithm for organized point clouds. Since the depth measurements allow us to directly 385 compute the cluster centroid position, and the camera is easier to calibrate, we can approach objects much 386 more precisely using the RGB-D camera.

387
When the object is close enough, we use registration of a CAD model to obtain a precise object pose (see of the object which is close to the image border is not an actual object border but is caused by the camera 395 view frustum. In practice, this problem particularly occurs with the large base station object (see Fig. 7).

396
The ICP pose is then normalized respecting the symmetry axes/planes of the individual object class. For 397 example, the cup is symmetrical around the Z axis, so the X axis is rotated such that it points in the robot's 398 forward direction (see Fig. 7).  Table 8.

407
We use straight-forward kinematic control for Momaro (see Fig. 9). Both arms and the torso yaw joint 408 are considered independently.

409
A goal configuration is specified by telemanipulation (see Section 10) or predefined keyframe sequences  Figure 9 shows how a motion, designed relative to a reference object, is adapted to a perceived object pose 425 to account for imprecise approach of the object.    Note that in principle ROS offers built-in network transparency. Since this functionality heavily relies on 472 the TCP protocol for topic discovery and subscription, even when the "UDPROS" transport is chosen, this 473 is unsuitable for unreliable and high-latency networks. Most high-bandwidth data from the robot is of streaming type. The key feature here is that lost messages 476 do not lead to system failures, since new data will be immediately available, replacing the lost messages.

477
In this particular application, it even would not make sense to repeat lost messages because of the high  In the uplink direction, i.e. commands from the operator crew to the robot, this includes e.g. direct joystick 485 commands. 486 Consequently, we use the nimbro network UDP transport for streaming data (red in Fig. 10). The 487 transport link between robot and field computer uses the FEC capability of nimbro network with 25% 488 additional recovery packets to compensate WiFi packet loss without introducing new latency.  Here, a message loss might be costly (e.g. SLAM maps are only generated on every scanner rotation) or 497 might even lead to system failure (e.g. loss of a ROS action state transition). Therefore, the TCP transport

MISSION CONTROL INTERFACES
For the operator crew, situational awareness is most important. Our system shows camera images, 3D 502 visualization and diagnosis information on a central ground station with four monitors (see Fig. 11).
In order to cope with the degraded communication link, the system needs to be as autonomous as possible, 504 while retaining the ability to interrupt, reconfigure or replace autonomous behavior by manual intervention.

505
To this end, our system provides three levels of control to the operator crew. On the highest level, entire 506 missions can be specified and executed. The intermediate level allows configuration and triggering of 507 individual autonomous behaviors, such as grasping an object. On the lowest level, the operators can directly 508 control the base velocity using a joystick or move individual DOF of the robot.

509
The last aspect of our control paradigm is remote debugging. Operators need to be able to directly 510 introspect, debug and manipulate the software on the robot in order to prevent relatively simple problems 511 from escalating to mission failures. 512 We describe the developed operator interfaces in the following. The mission can be configured and monitored using our Mission GUI (see Fig. 11). During the mission, 525 execution can be stopped at any time, mission updates can be performed, and the execution resumed.

526
Missions can also be spliced in the sense that the currently performed action is carried out and then 527 execution switches to a new mission.

528
In the case of a failure of the mission control level, or if the operator judges that the system will not be 529 able to carry out the mission autonomously, the execution can be interrupted and the task in question can 530 be carried out using the lower control levels. Afterwards, the mission can be resumed starting after the 531 completed task.

540
If all autonomous behaviors fail, the operators can also directly teleoperate the robot. For manipulation, 541 our operators can choose between on-screen teleoperation using 6D interactive markers in either Cartesian 542 or joint space, or immersive 3D telemanipulation (see Fig. 11) using an Oculus Rift HMD and 6D magnetic 543 trackers (see Rodehutskors et al. (2015) for details).

544
For navigation, the operator can use a joystick to directly control the base velocity. Teleoperation speed is 545 of course limited by the high feedback latency, so that this method is only used if the navigation planners 546 get stuck. Finally, several macros can be used to influence the robot posture or recover from servo failures 547 such as overheating.

549
To be able to react to software problems or mechanical failures, operators first need to be aware of the 550 problem. Our system addresses this concern by 551 • providing direct access to the remote ROS log,

552
• showing the state of all ROS processes, and

553
• transmitting and displaying 3D visualization data from the autonomous behaviors.

554
Once aware of the problem, the operators can interact with the system through ROS service calls over our 555 nimbro network solution, parameter changes, or ROS node restarts through rosmon. In extreme cases, 556 it is even possible to push small Git code patches over the network and trigger re-compilation on the robot. with a coarse map of the environment that had to be refined by the robot's mapping system. As detailed in 577 Section 9, the communication link to the operator crew was severely constrained both in latency (2 s per 578 direction) and in availability.

607
While preparing our run, we found the battery slot in the base station to have a significant resistance due 608 to a build-in clamping mechanism. Due to our flexible motion design workflow, we were able to alter the 609 motion so that Momaro would execute small up-and downward motions while pushing to find the best 610 angle to overcome the resistance.

611
The insertion of the battery requires high precision. To account for inaccuracies in both battery and 612 station pose, we temporarily place the battery on top of the station. After grasping the battery again, we 613 can be sure that any offset in height is compensated.

614
Furthermore, we found it to be error prone to grasp the battery at the very end, which is necessary to Camp with supervised autonomy. Our team was the only one to demonstrate all tasks including the optional 624 soil sample extraction. Figure 14 gives an overview of the sequence of performed tasks. A video of our 625 performance can be found online 9 . While overall the mission was successful, we experienced a number of 626 problems which will be discussed in detail.

627
In our run, Momaro failed to take the soil sample in the first attempt. During the vigorous scooping 628 motion, the scoop turned inside the hand (cf. Fig. 2, Fig. 13). We found the problem to be a malfunctioning 629 finger actuator in the hand holding the scoop. Since we were confident that Momaro would be able to solve up the ramp to perform the assembly tasks at the base station (Fig. 14). Although the operators paused 643 autonomous navigation at one point on the slope to assess the situation, no intervention was necessary and 644 navigation resumed immediately.

645
After finishing the course in 20:25 minutes, we used the remaining time to show some of Momaro's 646 advanced manipulation capabilities by removing debris from the terrain with Momaro and our intuitive 647 teleoperation interface (Fig. 13).

LESSONS LEARNED
Our successful participation in the SpaceBot Camp was an extremely valuable experience, identifying 649 strong and weak points of our system in a competitive benchmark within the German robotics community.  e.g. fixed timeouts on certain actions. Unfortunately, one of these timeouts resulted in an early abort of 675 the battery approach in our run, which had to be corrected by operator action. A more intelligent system, 676 tracking the progress of the current task, would have noticed that the approach was still progressing and 677 would have continued the approach. In future, we will investigate such resilient progress monitoring 678 methods in more detail.

CONCLUSION
In this article, we presented the mobile manipulation robot Momaro and its ground station. We provided 680 details on the soft-and hardware architecture of the integrated robot system and motivate design choices.

681
The feasibility, flexibility, usefulness, and robustness of our design were evaluated with great success at the 682 DLR SpaceBot Camp 2015.

683
Novelties include an autonomous hybrid mobile base combining wheeled locomotion with active 684 stabilization in combination with fully autonomous object perception and manipulation in rough terrain.

685
For situational awareness, Momaro is equipped with a multitude of sensors such as a continuously rotating 686 3D laser scanner, IMU, RGB-D camera, and a total of seven color cameras. Although our system was 687 build with comprehensive autonomy in mind, all aspects from direct control to mission specification can be 688 teleoperated through intuitive operator interfaces. Developed for the constraints posed by the SpaceBot 689 Camp, our system also copes well with degraded network communication between the robot and the 690 monitoring station.

691
The robot localizes by fusing wheel odometry and IMU measurements with pose observations obtained in 692 a SLAM approach using laser scanner data. Autonomous navigation in rough terrain is tackled by planning 693 cost-optimal paths in a 2D map of the environment. High-level autonomous missions are specified as 694 augmented waypoints on the 2.5D height map generated from SLAM data. For object manipulation, the 695 robot detects objects with its RGB-D camera and executes grasps using parametrized motion primitives.

696
In the future, shared autonomy could be improved by automatic failure detection, such that the robot