Handover Control for Human-Robot and Robot-Robot Collaboration

Modern scenarios in robotics involve human-robot collaboration or robot-robot cooperation in unstructured environments. In human-robot collaboration, the objective is to relieve humans from repetitive and wearing tasks. This is the case of a retail store, where the robot could help a clerk to refill a shelf or an elderly customer to pick an item from an uncomfortable location. In robot-robot cooperation, automated logistics scenarios, such as warehouses, distribution centers and supermarkets, often require repetitive and sequential pick and place tasks that can be executed more efficiently by exchanging objects between robots, provided that they are endowed with object handover ability. Use of a robot for passing objects is justified only if the handover operation is sufficiently intuitive for the involved humans, fluid and natural, with a speed comparable to that typical of a human-human object exchange. The approach proposed in this paper strongly relies on visual and haptic perception combined with suitable algorithms for controlling both robot motion, to allow the robot to adapt to human behavior, and grip force, to ensure a safe handover. The control strategy combines model-based reactive control methods with an event-driven state machine encoding a human-inspired behavior during a handover task, which involves both linear and torsional loads, without requiring explicit learning from human demonstration. Experiments in a supermarket-like environment with humans and robots communicating only through haptic cues demonstrate the relevance of force/tactile feedback in accomplishing handover operations in a collaborative task.


INTRODUCTION
Recent studies testify an increasing use of robots in retail environments with a number of objectives: to relieve staff from repetitive tasks with low added value; to reallocate clerks to customer-facing activities; to gather in-store data for real-time inventory to reduce out-ofstock losses (Retail Analytics Council, 2020). Commercial solutions already exist for automated inventory management, e.g., the Bossa Nova 2020, Tally 3.0 by Simberobotics, LoweBot by Fellow Robots or Stockbot by PAL robotics (Bogue, 2019). However, other in-store logistic processes, like product pre-sorting and shelf re-stocking are still difficult to automate, even though there is a strong interest from retailers, due to their high costs (Kuhn and Sternbeck, 2013). The most time consuming task is the shelf replenishment and 50% of such time is devoted to find the correct slot on the shelf. Only few scientific contributions exist on this logistic automation problem (Ricardez et al., 2020;Sakai et al., 2020), mainly related to the Future Convenience Store robotic challenge launched by the World Robotic Summit 1 . This is because the re-stocking task involves complex manipulation operations that are still a challenge for a robot. Dexterous manipulation of objects with unknown physical properties is very difficult, especially with simple grippers like parallel jaws, that are the most widespread devices for material handling. Adopting a fixed grasp with a parallel gripper can be very limiting when objects have to be handled in cluttered and narrow spaces. Changing the grasp configuration without re-grasping the object requires the ability to re-orient the object in hand, that is a pivoting motion about the grasping axis of the parallel gripper. Recent papers demonstrated that such in-hand manipulation abilities allow a robot to autonomously refill a supermarket shelf (Costanzo et al., 2020b;Costanzo et al., 2020c;Costanzo et al., 2021). Nevertheless, the whole restocking process requires execution of additional tasks that cannot be executed by robots, like opening cartons, removing packaging of multiple items before placing them on the shelf, or handling exceptions, like liquid spills. All these operations demand for complex cognitive and/or manipulative skills that are still a human prerogative. Therefore, with the current technology a step toward automation of retail logistics can be taken through execution of such processes in a collaborative way, where robots and humans are able to exchange objects, the so-called handover operation. Availability of a robot able to receive objects from the human clerk (H2R operation) can significantly alleviate the human workload since the robot can easily reach lower or higher shelf layers, that are outside the ergonomic golden zone. The dual robot-to-human operation (R2H) can be very useful in stores to help impaired or elderly people to retrieve products from the shelves. Analogously, in fulfilment centers of large on-line stores, robots can retrieve products from shelves and pass them to a human operator who takes care of their packaging. In a more futuristic scenario where robots can fully replace human clerks, the robot-to-robot (R2R) handover operation can be envisaged during the whole in-store logistic process.
The handover task is commonly defined in the robotics community as a joint action between two agents, the giver and the receiver (Ortenzi et al., 2020). It is usually divided into two phases, the pre-handover and the physical handover. In each phase, a number of aspects should be considered. For a detailed review of each aspect, the reader is referred to the survey by Ortenzi et al. (2020); in the following only a short review is presented with the aim to frame the contribution of the present paper within the literature.
The communication mechanism is crucial for initiating the action and to coordinate it as soon as it has started (Strabala et al., 2013). We address only the communication problem during the physical handover phase, and we adopt haptic cues as the sole communication mean between the two agents, even in the R2R handover the robot controllers do not share any further communication channel. The cues are a simple code based on the interaction force perceived by the agents through tactile sensing at fingertips. The code assumes that robots are able to measure all components of the force vector, thus the grippers are equipped with force/tactile sensors built in our laboratory (see the experimental setup in Figure 1).
Grasp planning is another fundamental problem to tackle for an effective handover execution. Ideally, the giver should grasp the object in such a way the receiver does not need to perform complex manipulation actions after the handover before using the object, as discussed, e.g., in Aleotti et al. (2014). In a R2H context, the robot should be capable of dexterous operations to hand the object over in a proper pose. Even parallel grippers are able to perform in-hand manipulation actions as demonstrated by Costanzo et al. (2020a). Both object and gripper pivoting maneuvers are used in this paper to achieve this objective, for instance, choosing an object orientation so as to avoid high torsional loads at receiver fingertips in case a precision grasp is needed to take the object. This can likely happen in a collaborative packaging task where the human operator has to arrange in a box the items passed by the robot. In the H2R context, as proved by many studies on human behavior, e.g. (Sartori et al., 2011), the giver grasps the object to hand over by considering the final goal of the receiver. If a desired orientation is required, the robot should apply a grasp force suitable to hold also the torsional load. To the best of our knowledge no other research deals with handover of objects subject to torsional load.
Still in the H2R operation, visual perception of the object location by the robot is challenging since it has to be performed with the object already grasped by the human, hence it is partially occluded. We adopt a texture-based approach that tracks 3D key points on the object surface detected by an eye-in-hand RGB-D camera. Hence, the robot directly tracks the object rather than the human hand, and no additional markers are needed, differently from the marker-based solutions proposed by Medina et al. (2016) and by Pan et al. (2018). Among the marker-less approaches, recent works are (Nemlekar et al., 2019;Yang et al., 2020;Rosenberger et al., 2021). Each of these works on H2R handover, and our paper, use object and/or human tracking with a different aim. Nemlekar et al. (2019) focus on reaching the handover location by tracking the human skeleton and predicting the so-called Object Transfer Point, Yang et al. (2020) focus on the choice of the grasp orientation based on the human grasp type, while Rosenberger et al. (2021) focus on the choice of a grasp that ensures the safety of the human using eye-in-hand vision. Our approach, instead, aims at reaching a grasp selected beforehand based on the task that the robot has to perform after the handover, i.e. placing the object on the shelf possibly using in-hand manipulation which requires specific constraints on the grasp. Differently from the solution by Yang et al. (2020), in our work the grasp motion is not planned but directly executed in a closed-loop fashion, this allows the robot to achieve a speed comparable to the human one while automatically reacting to the giver motion. The approach proposed by Rosenberger et al. (2021) is computationally demanding and it is still not suitable for real-time closed-loop control. In contrast, our approach requires an object database containing the grasp poses (this is a fair assumption in a supermarket scenario) but enables a fast closed-loop control and can track the object even if the human changes the object pose during the handover execution. The main limit of the solution by Nemlekar et al. (2019) is the need to keep the entire skeleton of the human body within the field of view of the camera to allow the tracking algorithm to work properly; this limit has been overcome by Rosenberger et al. (2021) who can track only parts of the human body as well as our approach which tracks the object only.
A fast visual servoing loop is used in this paper to control the robot receiver motion in such a way the handover is fluid, that is a primary requirement as discussed by Medina et al. (2016). This way, during the H2R operation the handover location is chosen by the giver and the receiver moves toward it in real-time. For instance, this is helpful if the human has a limited workspace and the robot has to reach an handover location compliant with the human constraints. When the robot approaches the object, the initial error between the actual and desired robot location could be high, in the classical algorithm by Marchand et al. (2005) this issue limits the dynamic performance of the visual servoing controller. This paper improves the dynamic performance of the visual controller in terms of speed by using a time-varying reference on the feature space to have a low tracking error and higher control gains.
During the physical handover phase the object load is shared by the giver and the receiver. A number of studies on the forces exchanged by humans during this phase have been carried out. For instance, Mason and MacKenzie (2005) found that the grip force of both giver and receiver is modulated during the object exchange. The giver decreases its grip force, while the receiver increases it until the load is transferred. Another relevant aspect, highlighted by Chan et al. (2012), consists in the post-unloading phase, when the giver still applies a grasping force even though its sensed load is almost zero. This means that the giver is the agent responsible of the object safety during the passing. The main contribution of the present paper focuses on these two aspects by proposing grip force modulation algorithms, based on the sole load perception including both linear and torsional one, for enabling a safe load sharing and transferring. The building blocks are the slipping avoidance algorithm originally proposed by Costanzo et al. (2020b) and the in-hand pivoting abilities devised by Costanzo et al. (2020a). In the H2R operation the robot gripper is controlled using the slipping avoidance modality so as the grip force is automatically computed based on the sensed load and slipping velocity estimated through the nonlinear observer proposed by Cavallo et al. (2020). However, the slipping avoidance strategy alone revealed ineffective, a communication channel has to be established between the giver and the receiver. We adopt haptic cues applied by the robot to the giver to communicate that the contact with the object is established and its readiness to hold the load. The load transfer takes place safely and effectively without any knowledge of the object inertial properties and center of gravity position with respect to grasping points, but only friction parameters have to be known in advance, mainly related to object surface material. Differently from the approach by Medina et al. (2016), no model learning phase is required.
In the R2H operation the grip release strategy is of paramount importance. Most of the works adopt a simplistic approach, namely, the robot releases the object as soon as a pull by the receiver is detected. However, taking the release decision based on the pulling force only might cause object falls if the receiver is not sharing the load. On the other hand, works deciding the grip release on the load sharing only might be wrong as well since without any pulling force cue there is no clear knowledge of the receiver intention. In this paper, we adopt a strategy based on both indicators, the pulling force and the load share. This greatly enhances the reliability of the R2H handover.
In the R2R operation reliability of the object passing is crucial. Assuming the adoption of parallel grippers only, if the receiver grasps the object in a location too far from the center of gravity, the grip force might exceed the gripper force capability to keep the initial object orientation. In this case the giver should foresee this situation and avoid the grip release. It should communicate to the FIGURE 1 | Left picture: Experimental setup for the robot-to-human and human-to-robot handover, the frame G (in RGB convention) is the grasp frame, the magenta arrow represents the pulling direction y pull (defined in Section 3); Right picture: Experimental setup for the robot-to-robot handover. The robot on the left is the receiver and is equipped with an eye-in-hand camera to track the object to take. Both robots are endowed with force/tactile sensors.
Frontiers in Robotics and AI | www.frontiersin.org May 2021 | Volume 8 | Article 672995 receiver the need to change the grasp location. We present here a first method to address the problem, based on the anticipatory detection of the slipping velocity. This feature is enabled, in this paper, by a smart use of the slipping detection algorithm by the giver and the controlled sliding by the receiver. Remarkably, in our approach the two robots are independent agents and the controllers are not coordinated but they communicate via haptic cues only. The rest of the paper is organized as follows. Section 2 describes the reactive control algorithms for visual servoing, slipping avoidance and controlled sliding. Section 3 presents the finite state machines (FSM) handling the whole handover task in the three scenarios R2H, H2R and R2R. Experimental results obtained in an in-store logistics collaborative scenario are described in Section 4, while conclusions are drawn in Section 5.

REACTIVE CONTROLLERS
In the handover execution, we assume that the receiver, whoever it is, has to adapt to the giver behavior. For instance, in all operation types, the giver shows the object to the receiver in a zone within its field of view, and the receiver has to reach such location even if the giver moves before giving the object. Even in the R2R case this should be achieved because we assume there is no communication between the two control units, apart from the haptic cues during the physical handover. Moreover, any handover task is characterized by a large degree of uncertainty affecting the handover location, object mass and grasp location. We address these issues by resorting to sensor-based reactive control. The first controller is based on visual feedback and is a fast visual servoing algorithm aimed at tracking the object motion carried by the giver. The second controller is based on force/tactile feedback and aims at modulating the grip force so as to avoid object slippage while transferring the load from the giver to the receiver. The two controllers are activated in different times and the scheduling logic is encoded in the state machines described in Section 3.

Visual Servoing Controller
The objective of the visual servoing controller is to generate a velocity of the receiver hand so as to grasp the object to exchange. This feature is certainly needed in the H2R operation, but it can be useful even in the R2R operation in case the base frames of the two robotic agents are not calibrated, as in our experiments.
During the H2R handover, the human giver presents an item to the robot by putting it within the field of view of the camera. Since the human is not able to keep the object fixed or because he/she would like to move the object before passing it, the handover location is time-variant. We address this problem by using a visual servoing controller that tracks the object and controls the robot velocity toward the object location.
The controller is based on the ViSP library (Marchand et al., 2005) and uses the images acquired from an RGB-D camera (RealSense D435i in Figure 1) to adjust the robot pose with respect to the object and correctly grasp it. We acquire offline a target image that corresponds to the desired grasp configuration, then the algorithm moves the robot to align the current image with the target one. The target image implicitly defines the endeffector pose relative to the object, i.e. the grasp pose. The grasp pose of each object is calibrated in a preliminary phase using the gripper with two calibration fingers specifically designed to achieve the desired accuracy of the grasping location. In this paper, for each object there is only one grasp pose, but the same procedure can be repeated to define multiple grasp poses as proposed by Costanzo et al. (2021) for a pick-and-place task. The images are synthetically represented by 3D feature points matched between the target image and the current image by means of a keypoint matching algorithm available in the ViSP library.
The visual servoing algorithm minimizes the error between the target features s * and the actual ones s: where π i and π e are the intrinsic and extrinsic parameters of the camera, respectively, and I(t) is the actual image. The vectors s, s * ∈ R 3N f , where N f is the number of features, contain the 3D locations of the actual and target features p i and p * i , respectively, thus The algorithm minimizes the error (Eq. 1) by controlling the camera linear and angular velocity with the following law where v c is the 6D camera twist, L is the interaction matrix, and λ > 0 is the control gain (Marchand et al., 2005). Commonly, s * (t) s * is constant. This means applying a step reference to the control algorithm. To ensure stability, the higher initial error imposes to use low gains. This issue is typically overcome by using an adaptive gain which is lower when the error is high and increases as the error declines. Such a solution implies a slower motion in the initial phase. To speed-up the motion, while ensuring stability, we propose to use a high constant gain while applying a time-varying reference s * (t) obtained by interpolating each feature between the initial value s(I(0)) and the target one s * in a give time t f . The interpolation is obtained by applying the same time-varying rigid body transformation to each feature, i.e., beingp the vector of homogeneous coordinates of the point p. T(t) is a time-varying homogeneous transformation matrix such that T(0) I and T(t f )p i (0) p * i . Thus, T(t) is obtained by a linear interpolation between T(0) and T(t f ) in translation and Frontiers in Robotics and AI | www.frontiersin.org May 2021 | Volume 8 | Article 672995 4 quaternion rotation. This choice implies that the initial error e(0) 0 and e(t) has a bell-like shape.
The homogeneous transformation matrix T(t f ) can be found by solving the following problem We adopt the Ceres 2 solver to solve this optimization problem and, to ensure that T is a homogeneous transformation matrix, we parametrize it with a position vector and a unit quaternion.
The keypoint matching algorithm of the ViSP library is used to find only the initial features s(I(0)) and the corresponding target features s * , then the keypoint tracking algorithm of the library tracks the actual feature in real-time yielding s(I(t)). The matching algorithm is based on the texture information of the object and, if the object has the same texture (such as a drawings or text) in two different locations, the matching algorithm could wrongly match two keypoints of these two locations (an example is shown in the top picture of Figure 2). When this happens there is one or more i such that the residual in Eq. 6 has a value higher than the required accuracy ε v , i.e., If condition (Eq. 7) is detected, our algorithm does not remove all the features that meet the condition, but it removes only the feature with the highest residual and reiterates the optimization algorithm by solving (Eq. 6) again. This is helpful because in the next iteration, without the removed feature, the optimization algorithm will find a solution with a lower residual on the other features and the condition (Eq. 7) may not be satisfied anymore; in this case the algorithm is stopped. In the end, all the mismatched features are removed (see the bottom picture of Figure 2). The accuracy achieved by the algorithm is relevant to the successful execution of the task by the robot after the handover. For instance, if the robot needs to execute pivoting maneuver, i.e. let the object rotate in-hand so as to reach a vertical orientation, the grasping point needs to be above the CoG otherwise the object would not be vertical at the end of the maneuver. Based on our experience, we estimate that the grasping point should be in a 2 mm ball located on the vertical line above the CoG. For this reason, we set ε v 0.002 m m. We verified that the algorithm is able to achieve such accuracy on the object in Figure 2 which is challenging for the matching algorithm because it has two different faces with very similar textures.

Grip Force Control
The control algorithm exploits the data provided by force/tactile sensors mounted on the fingertips, which can measure the 6D contact wrench (Costanzo et al., 2019). Each fingertip is a soft hemisphere with a stiffness and a curvature radius smaller than those of the handled object, hence the contact area is approximately a circle with radius ρ and the pressure distribution is axisymmetric. Under these assumptions, the grip force f n is related to the maximum tangential and torsional frictional loads by the relationships where µ is the dry friction coefficient, depending on the object surface material, ξ is a parameter depending on the pressure distribution, while δ and γ are parameters depending on the soft pad material only. It is well-known that the radius is related to the grip force as ρ δf c n (Xydas and Kao, 1999), where the two parameters can be estimated through an experimental procedure described by Costanzo et al. (2020b). According to the Limit Surface (LS) concept (Howe and Cutkosky, 1996), a sliding cannot occur if the external load (f t , τ n ) applied to the object is internal to the LS (the blue area of Figure 3). The LS can be enlarged by increasing the grip force f n . This way, given the measured load (f t , τ n ), it is possible to compute the minimum grip force to hold the object f n LS . The method for this computation can be found in (Cavallo et al., 2020). However this grip force can balance constant loads only, while in dynamic conditions, i.e., with time-varying loads and in case of non negligible accelerations, an additional grip force is needed. This can be computed, based on a dynamic model of the instantaneous rotational motion of the object about the Center of Rotation (CoR), via the nonlinear observer where ζ is the internal LuGre state (Canudas de Wit et al., 1995) and ω is the estimated slipping velocity. The function g(f n , c) cf tLS + τ nLS represents the maximum dry friction torque about the CoR, it depends on the normal load and on the CoR position c, which can be estimated via an algorithm that can be found in (Cavallo et al., 2020). The term σ 0 ζ is the actual dry friction being σ 0 the stiffness of the microasperities of the contact surfaces; y τ n − cf t is the generalized torque measured at the fingertip about the CoR axis and l is the observer gain. As soon as y > g(f n , c), a slipping velocity ω builds up. The viscous friction function σ 1 depends on the area of the contact surface, i.e., ) where β a is the viscous friction coefficient per area unit (Shkulipa et al., 2005), that can be estimated with the same exploration procedure used to estimate the dry friction coefficient μ, i.e., by rubbing the object at constant velocity. The methods described above are used to design a grasp controller for computing the grip force to avoid object slippage, the so-called slipping avoidance (SA) control modality, used in the handover strategy described in Section 3. The grip force is the superposition of two components, namely, where f n LS and f n d are the so-called static and dynamic components, respectively. f n d is aimed at counteracting timevarying external disturbances, such as the inertial forces that arise during the giver release and object transportation phases. It is computed by a linear control action on the estimated slipping velocity ω with the aim to regulate it to zero. The two actions can be activated separately depending on the specific phase of the handover task. When both actions are present the control mode is called dynamic mode, when the dynamic action is switched off, the control mode is called static mode. The activation strategy of the two modes is described in detail later on in the paper. Moreover, to ensure that a contact is always present so as the tactile sensors are able to measure, when the SA is active, the grip force is lower bounded by a minimum value depending on the accuracy and signal to noise ratio of the force/tactile sensor.
In the pre-handover phase, another control modality can be useful to change the relative orientation between the gripper and the object in a configuration that is more suitable for the exchange. This modality is called pivoting and its aim is to bring and keep the object in a vertical position, i.e., with zero frictional torque. Again, the LS method can show that the grasp force to achieve this objective coincides with the Coulomb friction law, i.e., f n f t /μ. Thus, the grasp force is brought from its current value to this limit by applying the first order linear filter in the discrete time k where the value α sets the time constant of the filter and f t (k) is the measured tangential force. The value of α changes the velocity of the pivoting maneuver and should be chosen depending on the bandwidth of the available force-controlled gripper. The interested reader can find more details about the algorithm in (Costanzo et al., 2020b;Cavallo et al., 2020).

HANDOVER STRATEGY
The assumptions underpinning the proposed approach and the requirements of the application tackled in this paper are summarized as follows: • We do not investigate the signaling problem, we assume that the receiver already agreed on getting the object from the giver • The receiver already knows which object the giver is passing; • Both the agents have the capability to withstand the weight of the object; • The objects to exchange are texture-rich and with a cylinderlike or parallelepiped-like shape; • A simple handover protocol is known to the agents based on haptic cues; • We use parallel grippers; • The handover location is decided by the giver; • We are not allowed to sensorize, e.g., using ARTags, neither the objects or the human; • We are allowed to use only sensors on board the robot, no external sensor can be used; • In the R2R case no communication between the robot control units is allowed.
Under these assumptions, which seem reasonable in the instore logistics settings where performance and cost should be well balanced, with only visual and tactile information we perform natural handover operations.
For each handover operation, H2R, R2H and R2R, we use different strategies. Each one corresponds to a FSM represented in Figure 4 and described hereafter.

H2R Operation
In the H2R operation, the human holds an item and the robot has to grasp it from the human hand. The strategy is represented in the top-left FSM diagram of Figure 4. In the Init state the robot activates the visual servoing algorithm, described in Section 2.1, to reach the handled object in the grasp configuration corresponding to the target image. During this phase, the human can move and the robotic receiver tracks the object by complying with the dynamic behavior of the giver. When the time varying feature trajectory (Eq. 5) terminates and the visual servoing error norm is below the desired accuracy ε v , the grasp location is considered reached and the gripper is commanded to close the fingers. As soon as contact is established with the minimal grip force, the visual servoing controller is disabled and the SA is activated in dynamic mode (Grasp state). Note that, during the gripper closing (before the SA activation), the visual algorithm is still active because the human operator can move the object, which makes the handover operation fluid. As soon as the SA is activated, the robot is commanded to move backward (Move back state), this gives a haptic cue to the human that feels a tangential force in the direction of the robot displacement, this cue means that the robot has securely handled the object and the human can release it.

R2H Operation
In the R2H operation, a robot intends to pass an held object to a human. The algorithm is described by the top-right FSM diagram of Figure 4. During the Init state the system is in the pre-handover phase, the gripper is controlled in SA dynamic mode, the robot feels the object weight by means of the sensorized fingertips; the weight is represented by the force component f w z that the sensor measures along the z-axis of the world frame (assumed aligned and opposed to the gravity). Note that, before activating the SA mode, the gripper is first controlled with a pivoting mode in order to present the object to the receiver with a vertical orientation. This expedient allows the receiver to grasp the object with no torsional load, thus requiring a minimal effort in terms of grip force. Once the weight f w z,i is acquired, the robot waits for the human to initiate the physical handover phase (Wait state). During the handover, the object is held by both the robot and the human and the load is shared by the two agents. In the Wait state, the slipping avoidance algorithm is in dynamic mode (Eq. 12) and it counteracts any disturbance, i.e., any external force or torque applied to the object. When the human receiver starts grasping the object holding part of the load, the robot feels less weight and it enters into the Sharing state. This is done by comparing the initial measured weight f w z,i with the actual force component f w z , by checking the condition where the scale factor ] s establishes the amount of weight the human has to withstand before the robot enters the sharing state. In other words, when condition (Eq. 14) is satisfied, the robot assumes that the human has the intention to share the object weight and thus it believes that the handover can take place safely. Note that the sign of the inequality takes into account the direction of the z-axis, opposed to the gravity, thus the measured force f w z,i is negative. In the Sharing state the system is in the physical handover phase and the human is helping the robot to hold the object. According to the feedback given by different subjects during several trials, we decided to switch off the SA dynamic control action (static mode) because the subjects complained for an annoying vibration felt during the load sharing caused by the controller reaction to the slight trembling of the receiver. In this state, the FSM takes into account the possibility that the human does not have the intention to take the object anymore. This event is detected, similarly to (Eq. 14), by checking the condition where the scale factor ] w > ] s establishes the amount of weight the robot has to withstand again before returning in the wait for the Sharing state. In addition, to avoid useless switches between the Waiting and the Sharing states, the parameters ] s and ] w are selected such that (] w − ] s ) f w z,i is greater than the noise level affecting the measured force, being f w z,i the weight of the lightest object. When the condition (Eq. 15) is verified, the FSM returns into the Wait state. Still in the Sharing state, the FSM waits for a haptic cue informing the robot that the human is pulling the object. Once again, this is done by measuring the forces at the fingertips. When the cue is detected, the robot opens the gripper and the handover is complete. The haptic cue is detected if the following condition holds where f pull is the measured external force along the pulling direction, defined as the projection of the y axis of the grasp frame on the xy-plane of the world frame (see the frame in the left picture of Figure 1). The first part of the condition means that the receiver is pulling the object upwards, while the second part means that the receiver is pulling the object out of the gripper (note that in case of a pushing force, f pull is negative). The parameters ψ z and ψ p are thresholds selected by testing the algorithm using various threshold values with 10 human volunteers (5 male and five female) that gave us feedback on their sensation during the pulling phase. It is important to underline that the condition (Eq. 16) is checked only in the Sharing state, this implies that the giver releases the object only if the receiver shares a sufficient amount of weight and he/she pulls the object. If a pulling force is applied without any load sharing, the robot keeps the object firmly with the SA grasp control.

R2R Operation
The handover strategy for the R2R case is executed by two FSMs, one for the giver and one for the receiver, depicted in the bottom diagram of Figure 4. One might think to reuse the same strategy devised for the H2R case for the robot receiver and the same strategy devised for the R2H for the robot giver. However, this approach cannot handle the case the selected grasp configuration to be achieved via the visual servoing is not compatible with the maximum grip force of the receiving gripper. This might happen if the grasp pose is such that the gravity torsional load requires a grip force higher than the gripper capacity, i.e. the grasp location is too far from the center of gravity. When the giver is a human, this exception is easily handled by the giver exploiting his/her manipulation dexterity and the ability to coordinate visual and tactile feedback to anticipate the failure event. Therefore, in the R2R case we use the tactile feedback of the giver, in charge of the object safety, to anticipate the slipping event that would happen if the receiver grip force, required by the slipping avoidance, exceeded the gripper limit. Exploiting the a priori knowledge contained in the soft contact model and the tactile perception Frontiers in Robotics and AI | www.frontiersin.org May 2021 | Volume 8 | Article 672995 8 data only, the giver is able to foresee this failure event and reacts to avoid it, still using haptic communication only, as detailed in the following.
The giver in the Init state is grasping an object with the grasp control in SA mode, then it records the grip force f n,i necessary to hold the object in the current configuration. In this pre-handover phase, the giver brings the object inside the field of view of the receiver camera, so that this starts tracking the object to grasp it. The giver is now waiting for a haptic cue provided by the receiver that has the intention to take the object (Wait state). The receiver executes a short backward and forward motion by first pulling the object to generate a pulling force f pull,g detected by the giver fingers, and then it returns to the initial position, so as to avoid keeping an external load that the giver grip force counteracts. The detecting condition of the cue is where the threshold ψ p is the same used in the R2H case. The giver is now in the release state and it starts opening the gripper by setting zero grasp force with a rate of 0.3 N/s, value tuned in combination with the response time of the slipping velocity observer. While the grip force is decreasing, the robot checks if the grip force reached zero and the following condition where ω g (t) is the object velocity with respect to the giver fingertips estimated by the observer in (Eq. 9), (Eq. 10) with the measured torque y on the giver gripper. ψ ϑ is a threshold that establish the maximum rotational slippage that can be accepted for the execution of task following the handover operation. Such a condition is aimed at detecting a slipping event of the commonly held object. Hence, to avoid actual slippage it is sufficient to use in the observer a value of the friction coefficient µ slightly lower than the actual one. This corresponds to consider a lower maximum friction torsional moment g(f n , c) and, in turn, a contracted LS, i.e., a virtual LS instead of the real one. This causes an estimated virtual slipping velocity that anticipates the slipping event, which would happen by continuing to reduce the grip force. This means that the actual load would cause a slippage at the giver contact surfaces which indirectly indicates that the receiver cannot safely withstand the load. This way ϑ g takes on the meaning of a virtual sliding angle. If the first condition on the grip force is verified before condition (Eq. 18), then the task of the giver ends, since the object is fully held by the receiver because no significant slippage occurred during the gripper opening. On the other hand, if condition (Eq. 18) holds with a non zero grip force, it means that a significant (virtual) slippage is happening and thus the receiver is trying to hold the object in a wrong configuration. Then, the giver grasp control is set to SA mode again, the robot giver sends a haptic cue (short forward and backward motion) to inform the receiver of the likely failure and, finally, it goes back to the Wait state with a grasp control that keeps the grip force to the initial one f n,i . From this state, different recovery strategies can be devised, e.g., re-plan the receiver grasp. In this paper, where we focus on the possibility to catch an exception only using tactile feedback and a model, we assume to know a new grasp configuration for the receiver compatible with the gripper capability and the receiver will move toward this new grasp location. The robot receiver in the Init state, with the grasp control in SA mode, sends a haptic cue to the giver, again with the short backward and forward motion. Then, it comes to a Wait state where the following conditions are checked: where f pull,r and ω r are the pulling force and the slipping velocity detected by the receiver, respectively, ψ ω is the threshold on the slipping velocity necessary to detect the slipping event. If the first condition is verified, the receiver detects the cue sent by the giver informing it of the possible failure, thus it enters the Regrasp state to handle this exception. In our case we move the receiver toward the giver to grasp the object closer to the center of gravity. This way, a lower grip force is necessary to hold the object in its orientation, hence the receiver can try again to take the object from the giver re-starting from the Init state. If the second condition, alternative to the first one, holds, then the handover is successful and the task ends.

EXPERIMENTS
The handover algorithms described so far have been tested in a number of experiments carried out on the setup described in Section 4.1, testing all three handover operations H2R, R2H and R2R.

Experimental Setup
The experimental setup is composed of two robots (Figure 1), a Kuka LBR iiwa seven equipped with a WSG-50 gripper, and a Yaskawa Motoman SIA5F equipped with a WSG-32 gripper. The finger motion of both grippers is velocity controlled via a ROS interface that communicates with the grippers through a LUA script running on the gripper MCU. Because of the LUA interpreter, the velocity commands can be sent to the grippers only with a rate of 50 Hz, this limits the performance of the grasp force controller. Each gripper is equipped with two SUNTouch sensorized fingertips (Costanzo et al., 2019) that are able to measure the 6D contact wrench. The method to extract the external wrench applied to the object from the single finger contact wrench vectors is explained in (Costanzo et al., 2020b). The sensors communicate with the ROS network via a serial interface at a sampling rate of 500 Hz. The mean error on the measured force is 0.2 N, hence we set the lower bound of the SA algorithm to a conservative value of 0.5 N. The grip force is controlled via a low-level force loop that acts on the finger velocity to track a reference grip force with a typical response time of 0.15 s, hence the time constant of the filter in (Eq. 13) is set to 0.33 s corresponding to the value of α reported in Table 1 considering the sampling rate of 500 Hz of the digital implementation. The iiwa robot is also equipped with an Intel D435i RGB-D camera, mounted in an eye-in-hand configuration, used for the visual servoing phase. Two sets of experiments have been carried out. The first concerns H2R and R2H handover operations performed in a typical pick-andplace task in a collaborative shelf replenishment scenario of a retail store. The setting consists of shelf layers with a clearance of 15 cm. Each shelf has different facings to place the objects detached by 5 cm height separators. A human operator hands an object over to the iiwa robot (H2R), which then places it on a shelf layer. The placing motion is autonomously planned off-line using the method proposed by Costanzo et al. (2021) and the solution found to accomplish the task might include pivoting operations (pivoting the object inside the fingers or rotating the gripper about the grasping axis while keeping the object fixed). The method allows the robot to plan off-line even if the starting robot configuration is likely different at run time since the object is passed by a human and it is not picked from a fixed location. After the place task, the robot is asked to re-pick the same object and hands it over to the human (R2H). This sequence has been selected simply because it allows us to test both handover operations with a human.
The second set of experiments concerns the R2R handover operation. The SIA5F robot is commanded to pick an object in a given position and to pass it to the iiwa robot. The SIA5F is controlled according to the R2R Giver FSM, while the iiwa is controlled via the R2R Receiver one. The two robots communicate only via haptic feedback through the protocol defined in the FSMs. The R2R experiments have been carried out with two different objects.
The quantitative results reported in the next sections refer to experiments carried out with three different objects, a plastic bottle full of liquid (Object A), the same bottle but empty (Object B), and a cardboard box (Object C). Moreover, in the Supplementary Video S1, there are an extra R2R experiment performed with a resin block (Object D) and an extra H2R experiment carried out with a cardboard cube (Object E). Object D has been used since it is fully rigid and is made of the same material used in the calibration of the tactile sensors, hence the experiment runs in ideal conditions highlighting details of the physical handover phase. Object E is challenging to handover, since it is small compared to the robot fingers and the human should present it in a way depending on the pick-andplace task, in particular on the placing action the robot has to do. Table 1 shows all the objects used in the experiments with their friction parameters together with all parameters necessary to run the algorithms that do not depend on the specific object. Note that different friction parameters have been used for iiwa and SIA5F robots since the fingers are different. Figure 5 shows the snapshots of the first experiment execution. In the beginning, the human brings the object A inside the field of view of the robot camera and the robot goes toward it by using the visual servoing algorithm (t 0 s). Before the grasp, the human rotates the object (t 2 s) by changing its pose during the robot motion to reach the object. We intentionally introduced this dynamic change to show the capability of the algorithm to reach the target grasp point (t 6 s) even in case of rapid motions of the human giving the object to the robot. This feature alleviates the cognitive burden of the giver, who does not need to focus on staying firm in the handover location waiting for the robot. Figure 6 shows the visual servoing error during this phase, the error starts from zero with a bell-like shape thanks to the time-varying reference features s * (t) described in Section 2.1. This strategy permits a fast (but still smooth) convergence of the algorithm. Between t 3 s and t 4 s the error variation corresponds to the object rotation by the human, after that the error decreases and then, around t 5 s the human moves again just before the robot grasp and the error increases, but still the grasp is successful. The error does not decrease again because the contact has been established and the physical handover phase begins. Figure 7 shows the forces and torsional moment as well as the estimated sliding velocity during the whole experiment, while the top plot of Figure 8 shows a detail of the H2R physical handover phase. The first grasp force peak depends on the impact velocity between the fingers and the object, then at t 6 s the SA mode is activated and the receiver automatically chooses the grasp force. Here the load sharing phase starts, and both the agents hold the object weight. The weight of object A is 1 | Parameters of the algorithm (SI units). The friction model parameter μ, δ, and c appear in Eq. 8 and β a in Eq. 11. Note that: object mass is not used in the algorithms, it is reported for the reader's convenience in the interpretation of the results; the parameter δ is object independent for rigid objects, for partially deformable objects it has to be estimated for each object; object B is the same as object A but the bottle is empty; the parameter α is reported for a sampling rate of 500 Hz. 290 g, but in the sharing phase (around t 6.5 s) the robot feels a tangential load f t corresponding to less than half of the weight (107 g), and, in turn, the slipping avoidance algorithm applies a grasp force of about 1.5 N, demonstrating how sensitive is the grip force modulation implemented by the SA controller. At about t 7 s the robot moves back to give a haptic cue to the  Frontiers in Robotics and AI | www.frontiersin.org May 2021 | Volume 8 | Article 672995 11 H2R the robot feels a tangential force of about 1 N which corresponds to the object weight of 103 g. In turn, the grasping force f n is automatically set by the slipping avoidance algorithm to a value of 2.4 N which depends on the friction coefficient µ. The bottom plot of Figure 10 shows the detail of the R2H phase. Once again, from t 44 to t 54 s, the R2H FSM is in the Wait state and counteracts the disturbances on the measured force and torque. At t 55 s the human holds the object and the FSM enters in the Sharing state. Then, the human decides to keep the object giving the haptic cue to the robot by pulling it. As soon as the condition (Eq. 16) is verified, at t 56.2 s, the robot releases the object.

H2R and R2H Experimental Results
The third experiment of this subsection involves object C and it is reported in Figure 11. The H2R and R2H handover phases ( Figure 12) are very similar to those of the previous experiments and we will not discuss them further. This experiment is presented to show the particular pick-and-place task, because the handover grasp configuration is not compatible with the place location due to the tight clearance between two shelves. As shown in the Supplementary Video S1, to execute the placing maneuver the robot uses the gripper pivoting ability, again autonomously planned, i.e., the object remains fixed while the gripper rotates about the grasping axis. This happens from t 20 to t 25 s (see the estimated sliding velocity in the zoom of the bottom plot of Figure 11). After the placing the robot retreats. Then, it begins the new pick-and-place task to make the R2H operation. The robot picks the object again and, during its motion, the object impacts the facing separator. This is caught by the slipping avoidance algorithm that estimates a velocity peak at t 36.4 s and, in turn, increases the grasping force to counteract the generated torsional moment τ n , as evident in the top plot of Figure 11. The rest of the task phases are the same of the previous experiments. Frontiers in Robotics and AI | www.frontiersin.org May 2021 | Volume 8 | Article 672995 13 methods, based on physics models of the soft contact, have been presented in the framework of an in-store logistic collaborative scenario with a set of requirements and assumptions, and, in particular, using parallel grippers. Nevertheless, satisfactory results encourage us to investigate possible generalization to more complex robotic grippers, which can enlarge the set of objects that can be handled. We presented not only classical experiments of human-to-robot handovers and vice versa, but also a preliminary algorithm for robot-to-robot handover, that is envisaged useful in a future where robots collaborate with each other with simple communication channels, like haptic cues. The current limitation is the assumption of objects with specific shapes and the knowledge of the location of the re-grasp point. Overcoming this limit requires methods to re-plan or learn the re-grasping strategy.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.