- Collaborative Robotics and Intelligent Systems (CoRIS) Institute, Oregon State University, Corvallis, OR, United States
When a passively compliant hand grasps an object, slip events are often accompanied by flexion or extension of the finger or finger joints. This paper investigates whether a combination of orientation change and slip-induced vibration at the fingertip, as sensed by an inertial measurement unit (IMU), can be used as a slip indicator. Using a tendon-driven hand, which achieves passive compliance through underactuation, we performed 195 manipulation trials involving both slip and non-slip conditions. We then labeled this data automatically using motion-tracking data, and trained a convolutional neural network (CNN) to detect the slip events. Our results show that slip can be successfully detected from IMU data, even in the presence of other disturbances. This remains the case when deploying the trained network on data from a different gripper performing a new manipulation task on a previously unseen object.
1 Introduction
Sliding between any two surfaces frequently results in frictional vibration (Ibrahim, 1994; Akay, 2002; Wang et al., 2020) – a fact which our bodies appear to exploit for our sense of slip. Evidence suggests that the fast acting mechanoreceptors in our skin, specifically the FA I receptors that sense vibrations under 40 Hz, play a dominant role in grip modulation (Macefield et al., 1996). This has motivated vibration-based slip detection methods for robotics, which have been studied since as early as 1989 (Howe and Cutkosky, 1989; Romano et al., 2011). However, such methods often struggle to distinguish between vibrations induced by slips (i.e., instances of sliding between the fingers and a grasped object) and those induced by non-slip disturbances such as contact events or environmental sliding (i.e., sliding between a stably grasped object and the environment) (Romeo and Zollo, 2020). Motivated by the observation that true slip events tend to be accompanied by a pose reconfiguration in passively compliant hands, in this paper we investigate whether joint flexion and extension in passively compliant hands can be used to discriminate between sliding-induced vibration caused by slip and sliding-induced vibration caused by environmental sliding.
Passive compliance is the ability of a structure to deflect in response to an experienced force without active actuation. Passively compliant grippers usually exhibit passive compliance about their primary joint axes, resulting in a situation where the final configuration of the gripper depends not only on actuation input, but also on the contact forces experienced by the fingers. The fingers of a passively compliant gripper may also be able to deflect in other directions or change their geometry. Passively compliant robotic grippers have several benefits when compared to their rigid counterparts. The shape-conforming ability of these devices allows them to grasp a wide variety of objects without complex control schemes (Shintake et al., 2018). Similarly, their ability to deflect in response to external forces makes them an increasingly popular choice for applications which require gentle manipulation, such as crop harvest (Navas et al., 2021) and medical applications (Tai et al., 2016). This same tendency to deflect in response to external disturbances results in a behavior in which the fingers of a compliant gripper tend to change configuration in response to the same physical disturbance that causes the slip between the grasped object and the gripper (Figure 1). Combining the slip-induced vibration response (as captured by an accelerometer) with this fingertip deflection (as captured by a gyroscope) may therefore serve as a slip indicator.
Figure 1. Slip occurs as a result of some applied force on the object. In compliant hands, which adapt their shape depending on the contact force at the fingers, this is accompanied by a change in joint angle. Section. 4.1.
For this study, we trained a convolutional neural network (CNN) to detect slip from data gathered from fingertip-mounted inertial measurement units (IMUs) (see Figure 2). We began by conducting a series of manipulation trials using a two-finger, tendon-driven hand. These manipulation trials included both true slip and environmental sliding conditions for five different grasped objects. We then automatically labeled this data as slip or non-slip using motion tracking data. We trained the CNN on a subset of this data and generated predictions on unseen examples using two classification thresholds. The network was able to achieve either an F1 score of 0.70 or a recall of 25.2% at a precision of 93.1% depending on the threshold. We additionally demonstrate the ability of the network to generalize to additional unseen objects and to a three-finger, tendon-driven gripper performing a forceful manipulation task. The specific contributions are as follows:
1. We provide an open-source dataset containing the motion tracking data, sensor data, kinematic data, and machine learning features and labels used in this paper. The dataset is available at https://zenodo.org/records/15886336.
2. We present a machine learning classifier that demonstrates the feasibility of using fingertip IMUs to detect slip in compliant hands.
3. We deploy the machine learning classifier on a different gripper design performing a practical manipulation task (fruit picking) on a previously unseen object.
Figure 2. During the manipulation trials, the two finger gripper is placed in a motion tracking testbed. The accelerometer and gyroscope data from the fingertips is concatenated into feature matrices and preconditioned for machine learning. The motion tracking is used to reconstruct the scene geometry, which is in turn used to assign a binary slip label at each of a series of discrete timesteps. These feature matrices and binary labels are then fed into the CNN, the architecture of which is illustrated on the right.
2 Background
2.1 Discriminating slip from sliding
Slip sensing is a fundamental and much-studied aspect of robotic grasping. Romeo and Zollo (2020) provides a review of both current and classical slip detection methods. However, few papers address slip in the context of environmental contact, sliding, and robot motion. The most comprehensive work on discriminating between true slip and other disturbances appears to be Heyneman and Cutkosky (2016), which presented a method for classifying events as either slip (which Heyneman and Cutkosky call ‘object/hand’ slip) or environmental sliding (which they call ‘object/world’ slip, as in between the grasped object and something in the world around it). The authors achieved this classification by using dense arrays of tactile sensors, and were able to classify slip with a reported accuracy in the range of 49.38%–100%, depending on the sensor type and data pre-processing method. Later work examined the use of IMU arrays for the same purpose. Massalim and Kappassov (2019) calculated the cross-correlation of multiple IMU signals during slip events or environmental sliding and reported a difference in mean correlation between the two cases. However, that paper did not provide a complete slip detection method based on this statistical difference. Our work seeks to build on this work by creating and evaluating a predictive model.
2.2 Sensors for slip
Methods that use dense arrays of tactile sensors do show promise for contact-rich tasks. Such sensors provide enough information to reconstruct the contact area between the finger and the object and track its changes over time (Narita et al., 2020; Ruomin et al., 2021). Sensors commonly used for this purpose include the GelSight sensor (Yuan et al., 2017) and the Biotac sensor (SynTouch LLC, Los Angeles, California, United States). However, the cost and design constraints associated with these sensors can be problematic for some applications. For example, Gelsight and Biotac sensors are both based on fingertip deformation. If one of these deformation sensors is not isolated from the rest of a soft finger, the stretching and buckling of the finger material can be misinterpreted as contact at the sensor. There have been efforts to re-design these deformation-measuring sensors to be more suited to soft hands (Liu and Adelson, 2022), but the viability of emerging designs remains an active research area.
Another slip detection approach is to use individual or small arrays of force and pressure sensors. Recently, a few slip detection studies have been reported that take this direction. Both studies use grippers designed for robotic harvest of delicate crops, the type of task for which design constraints like size and flexibility may severely limit sensor choice. In Liu et al. (2024), researchers used a curvature and contact force sensor in each finger of a gripper to detect slip as weight was added to a grasped artificial fruit. This approach also incorporated vibration sensing by including the frequency of the changes in contact force measurements as part of the slip detection process. In Zhou et al. (2022), researchers used an arrangement of commercially available flexible pressure sensors embedded in a fin-ray gripper to detect sliding between the hand and an object. The researchers mounted the gripper palm-up and stably grasped an apple and apple tree leaf. They then slid the apple tree leaf relative to both the gripper and object, and used the resulting data to train a Long Short-Term Memory network. Our work seeks to extend these papers by examining a wider range of manipulation scenarios, which include gripper motion and sliding between object and environment, as well as a variety of grasped objects. Furthermore, we use IMUs as our slip sensor, which many researchers have already embedded in their soft and compliant hands, generally to reconstruct pose (Yu et al., 2023; Lin et al., 2024; Kieliba et al., 2018; Santaera et al., 2015).
3 Data collection
Our goal was to create a predictive model that could accurately identify slips during robotic manipulation, even in the presence of environmental sliding. We therefore aimed to create a dataset rich in examples of slip, sliding, and both combined. Motivated by a desire to make the results as applicable as possible, we performed a series of manipulation trials with a physical hand and objects. We designed four different manipulation scenarios (each combining environmental sliding and slip in different ways) for these trials, described in Section 3.1.1. Each manipulation type was performed several times using a range of grasped objects and initial grasps. This was done to improve generalization across factors such as surface friction, object shape, and grasp type on the model.
3.1 Manipulation trials
With our custom-made gripper (Section 3.2), we conducted a series of manipulation trials using five objects from the YCB Object Set (Calli et al., 2015): a Pringles chips can, a Jello pudding box, a plastic cup, a Rubik’s cube, and a tomato soup can. The objects were chosen based on their suitability for the gripper (e.g., size), rigidity, and ease of geometric representation (i.e., symmetry). For each object, we designed a close-fitting cap with indentations which we used to mount motion tracking markers in a fixed pattern (Figure 3, left).
Figure 3. The object set (left) consists of a Pringles chips can, a Jello pudding box, a plastic cup, a Rubik’s cube, and a tomato soup can. Each object is equipped with a 3D printed retroreflective marker mounting cap. The custom two-finger gripper (right) is a modified Yale Openhand, equipped with fingertip IMUs, data collection hardware, and mounting points for motion tracking markers (gray spheres on fingers).
For each trial, the gripper and a single object were placed in a motion tracking testbed with eight OptiTrack motion-capture cameras (NaturalPoint, Inc., Corvallis, Oregon, United States). We used the accompanying Motive software to log the data from these cameras. Markers were affixed to both the gripper and object in order to track their motion. The three dimensional position of each marker was recorded at a rate of 120 Hz using Motive (NaturalPoint, Inc., Corvallis, Oregon, United States). The marker positions and identities were sufficient to reconstruct the 6-dimensional pose of the gripper base, each individual link of the gripper fingers, and the object (see Section. 4.1).
Each manipulation trial began with an experimenter actuating the two finger motors to achieve a stable grasp on the object. A variety of grasps were used for each object, including both power and pinch grasps. A subsample of example grasps can be seen in Figure 4. Once a stable grasp was achieved, the experimenter began a recording in Motive. The experimenter then ran a Python script that connects to the hand via serial connection and logs the data from the fingertip IMUs to a. csv file. This script also triggers a 0.5 s motion of the timing arm on the hand (See Section 3.2), allowing for time synchronization between the sensor data and the motion tracking data during postprocessing.
Figure 4. For each combination of grasped object and manipulation procedure, a variety of initial grasps were used. These included symmetrical and asymmetrical configurations, as well as both power and pinch grasps. Motion tracking data was used to reconstruct these grasps for label generation purposes. Three of these reconstructions are illustrated here for each object.
Once a stable grasp was achieved, the experimenter then executed one of four procedures, representing different scenarios that may occur during autonomous, robotic manipulation of an object (see Fig. Figure 5 for an illustration). Two of the scenarios (labeled A and B below) represent manipulations where there is a significant amount of environmental sliding, but only a few minor slips. The grasp remains stable throughout most of the manipulation. In the case that the hand reconfigures, the object primarily rolls without slipping relative to the fingers. These two scenarios provide examples of environmental sliding both accompanied by—and not accompanied by—slip. Sliding between object and environment is frequently problematic for vibration-based slip detection methods, which is why we included it in our dataset.
Figure 5. Illustrations of the four manipulation scenarios. In scenario (A), the object shifts downward and to the right, and the fingers of the gripper reconfigure but stay in contact. In scenario (B), the gripper rotates and translates while its configuration remains the same and the object remains stably grasped. In scenario (C), the object is lost from the hand while moving in the plane. In scenario (D), the object is lost while moving out of the plane.
The second two scenarios (labeled C and D below) represent failed manipulations where the object is completely lost from the hand. The only difference between the two scenarios is the direction of slip. In one case, it is in the plane of the hand, a direction in which the fingers are highly flexible. In the other case, it is out of the plane, a direction in which the fingers are comparatively stiff. A complete description of each scenario and how we performed them is as follows:
3.2 Manipulation scenarios
• Scenario A: The first scenario is a grasp reconfiguration caused by an external disturbance. This scenario involves motion of the gripper’s fingers and the grasped object relative to the base of the hand, but not necessarily sliding between the fingers and the object. To replicate this scenario, a researcher manually applied disturbances to the fingers of the hand while attempting to induce as little slip as possible, though some incidental slip did occur.
• Scenario B: In this case, which is counter to Scenario A, the configuration of the fingers and object remain static with respect to the base of the gripper, but the gripper and object move about in the plane, sliding along the surface on which they rest. To replicate this scenario, the researcher held the base of the hand and slid it on a surface. This included changing the orientation of the hand.
• Scenario C: In this procedure, the researcher manually pulled the object away from the base of the gripper in the plane of the fingers until the grasp failed.
• Scenario D: Here the researcher manually pulled the object out of the plane of the fingers, again until the grasp failed.
The gripper was secured to the bed of the motion tracking arena using bolts for all procedures except procedure B. Procedures A and B were conducted for 60 s per trial, and procedures C and D until the grasp failed (average duration of 12.5 s). We performed a total of 195,200 manipulation trials in this manner, divided evenly across object and procedure. Five trials had to be excluded due to data quality issues (missing or phantom markers, hardware communication failures, etc.) leaving 195 trials in the final dataset.
3.3 Hardware
We manufactured a two-finger, tendon-driven gripper for this study (Figure 3) based on the Yale Openhand (https://www.eng.yale.edu/grablab/openhand/) which is in turn based on the iRobot-Harvard-Yale (iHY) Hand (Odhner et al., 2014). We used heat set inserts to mount a commercially available circuit board (#3387, Adafruit Industries LLC, New York, NY) containing a single inertial measurement unit (LSM9DS1, STMicroelectronics NV, Geneva, Switzerland) to each distal fingertip. The accelerometer and gyroscope within the IMU were sampled at a rate of approximately 40 Hz using a multiplexer (PCA9546, Adafruit Industries LLC, New York, NY) to toggle between the two IMU breakout boards and an Arduino Mega to transfer the data over a serial connection to a laptop computer. We mounted 9 retroreflective markers to the gripper for motion tracking (see Figure 3), as well as a small motor-driven arm with an additional retroreflective marker. This arm was moved at the beginning of each grasping trial in order to temporally synchronize the motion tracking data with the data from the Arduino.
4 Automatic data labeling and dataset
We chose to approach slip detection as a binary classification problem, with the grasp labeled at each timestep as simply ”slipping” or ”not slipping”. These slip labels needed to be extracted from the raw motion tracking data, which we did by reconstructing the geometry and kinematics of the scene at each timestep and applying a first-order contact model. Furthermore, we construct two-dimensional feature matrices from the fingertip-mounted IMU data from which to predict slip.
4.1 Geometric reconstruction
To label the data, we begin by reconstructing the geometry of the hand and object at each timestep. We use Motive to track the three-dimensional positions of the markers on the hand and object. Let
Figure 6. When an object is grasped, the elastomeric finger pads conform to the object (left). The reconstructed geometry is assumed to be rigid, which results in an overlap (right). The centroid of each overlapping region is treated as a contact point. The contact points are numbered based on the link of the gripper to which they belong.
For each time step we find contact regions between the fingers and object as follows. Since the objects and finger links are not perfectly rigid, there is a small deformation where the fingers contact the object. In our geometric model, this manifests as a small overlap between the non-deformable polyhedrons (Figure 6). Using Geometry3D, we calculate this overlapping intersection. Since the polyhedrons are all convex, this results in at most one contact region per finger polyhedron, for a maximum of six possible contact regions.
The contact region belongs equally to the two bodies in contact and can be thought of as two co-located regions: one on the object and one on the hand. Although these regions share the same location and shape, they do not necessarily share the same velocity. By looking at the difference between these velocities, we can determine whether slip is occurring.
4.2 Label generation
Since the
Figure 7. Each contact point is defined by a position along the link and a position on the object. At time
That is, if the speed of the relative motion between the two bodies at the contact point is above the cutoff value,
We find
The first product (i.e., left product) starts with the location of each contact point from the previous time step, as expressed in the frame of the associated link (Equation 1) and the frame of the object (Equation 2), then uses the present poses of the links
4.3 Feature selection
To create the features we downsample the motion data to a rate of 40 Hz. For each resulting timestep, we create a feature matrix from the previous 30 IMU samples from that point in time, equivalent to a 0.75 s window of data. Each of the two fingertip IMUs has three gyroscope and three accelerometer data channels, resulting in 12 total features per sample. This 12 by 30 feature matrix becomes associated with the slip condition at that point of time as a paired feature and label. We then remove any examples from the dataset where neither the hand nor object was in motion for the entire 0.75 s data window, as determined by the motion tracking data. Since the hand and object are in contact with the environment throughout the entirety of each trial, this removal process leaves only data containing environmental sliding. Because sliding is always present, the overall vibrational activity on the IMUs is comparable for both classes, as seen by Figure 8.
Figure 8. Within each IMU channel, both slip and non-slip samples show similar levels of vibration. Here, vibration is measured by root-mean-squared (RMS) activity of the standardized signals.
Finally, we divide the data by randomly placing 20% in a testing dataset. We then divide the remaining trials into a variety of train-validate pairs depending on the goal. For performance evaluation, we randomly divide the trials with a 6:2 ratio of training to validation data. To evaluate the ability of the network to generalize to unseen objects, we generate five additional pairs of training and validation data, using a leave-one-out technique. That is, we divide the data by moving all trials that were performed with a particular object to the validation set and use the remaining data for training. The final number of examples in each dataset are recorded in Table 2 and Table 3.
Table 3. The number of examples and class distribution in the leave-one-out datasets. These datasets contain the same examples as the combined training and validation data from Table 2.
5 Fruit picking dataset
To test the applicability of our trained network to real-world robotic manipulation tasks, we gathered additional data from a robot performing a fruit-picking task. We first recorded video and sensor measurements from 77 pick attempts on our physical proxy orchard system described in Velasquez et al. (2022). In our previous work, we demonstrated that models trained on in-hand data from a fruit picking task using the proxy orchard perform comparably in the orchard to those trained on data collected directly from the orchard itself. These pick attempts were performed with our custom three-fingered end effector (Dischinger et al., 2021) attached to an industrial manipulator arm (UR5e, Universal Robots, Odense, Denmark) (see Figure 9). Each distal link of the end effector is equipped with a 6-axis inertial measurement unit (MPU6055), which was sampled at a rate of 70 Hz. For each pick attempt, we replicated the relative starting pose of the gripper with respect to the apple from one of our field trials (Velasquez et al., 2022), then added a small random offset to the position and orientation to create a unique pick attempt. The gripper then closed, and the robot pulled back a fixed distance along a path coincident with the vector normal to the palm. We reviewed timestamped video of the experiments and recorded the start and end time of any visually-observable slip events. These times were used to label the grasp at every experiment time as either slipping or static (i.e., no visual change in the pose of the apple with respect to the gripper). That is, times after the slip started and before it concluded are labeled as slip and all other times are labeled as static.
Figure 9. We completed 77 apple picks on our physical orchard proxy (left); each pick attempt was videotaped and then manually inspected to identify if and when slip occurred. Each distal link has an imu embedded under the fingerpad (right).
We formatted the fruit picking data in the style of the dataset on which our model was trained. We downsampled the IMU data to 40 Hz to match the sample rate used in our manipulation trials with the two finger gripper. Since the hand used in those trials had only two fingers, we constructed feature matrices for each possible pair of fingers on the three-finger gripper (fingers 1 & 2, 1 & 3, and 2 & 3). These were constructed using the same method described in Section 4. This resulted in three sets of feature matrices, one for each of the three finger pairs, and one set of shared labels. We pre-conditioned this data to have a mean of zero and standard deviation of one for each feature. A total of 1,179 examples (3.3% slip) were generated in this manner from across the 77 trials.
6 Models and experiments
We implemented a neural network architecture composed of two one-dimensional convolutional layers followed by a fully connected output layer. The convolutional layers have a stride of one over the dimension of the feature matrix that represents time. Each convolutional layer is followed by a Gaussian Error Linear Units (GELU) activation function. We apply dropout between the convolutional layers and the fully connected layer. This architecture is visualized in Figure 2.
6.1 Training and performance evaluation
We began by training and validating the model. First, using hyper parameter optimization, we determined the best:
• number of filters in each convolutional layer
• size of the filters in each convolutional layer
• dropout rate
• learning rate and learning rate decay
• weights for our weighted binary cross-entropy loss function
• weight decay for L2 regularization
Specifically, we used the Tree-Structured Parzen Estimators algorithm (Bergstra et al., 2011) in Hyperopt (Bergstra et al., 2013) to find a set of hyperparameters that maximized the F1 score of the model on the validation dataset (see Section 4). In the final model, the first convolutional layer applies 7 3 × 3 filters, the second convolutional layer applies 4 3 × 3 filters, and the dropout rate is set to 0.4. Training-related hyperparameters are available upon request.
During training, each example’s class is predicted using the argmax function over the output logits. During validation, a softmax function is applied instead and the class is predicted based on a confidence threshold for the positive (slip) class. During validation we also benchmark our algorithm against two classical methods: a k-nearest neighbors (KNN) classifier and support vector classification (SVC).
We evaluate the overall performance of our trained network on our reserved testing data, with the consideration of two use cases: a conservative use case where false positives are considered more detrimental than false negatives and a balanced performance use case where both precision and recall are similarly important. We evaluated the true positive rate and false positive rate at a variety of thresholds and used the resulting receiver operating characteristic (ROC) ROC curve to determine two thresholds for testing. The first threshold is the minimum threshold at which the network achieves 95% precision. This represents the conservative case. The second threshold is where the maximum
6.2 Object generalization
As a way to assess the potential of the model to generalize to unseen objects, we went on to train five additional networks on the five leave-one-out training datasets described in Table 3. We kept the same network architecture and training procedure as above. We evaluate the performance of these five networks on the corresponding single-object validation datasets and compare this with the original model performance.
6.3 Task generalization
Finally, using the trained network from Section 6.1, we generate predictions for each of the three datasets (one for each pair of IMUs) generated from the three-finger hand (see Section 5). We use these predictions to generate an ROC curve for the network as applied to each dataset. We then consider the performance if these datasets were used together in an ensemble system where a consensus vote is used to produce a single prediction. That is, we use the model to produce a prediction for each pair of fingers, then assign a positive prediction only if the model predicts slip for all finger pairs.
7 Results & discussion
While our model had strong performance on the validation data Our model had strong performance on the validation data, outperforming both of the classical methods (see Figure 10). However, this performance varied greatly across objects. Performance was highest when classifying examples generated by grasping either the Jell-o pudding box or tomato soup can (Figure 11). Additionally, the model failed to achieve high precision when considering only examples generated by grasping the Pringles chips can. This is likely due to the very low percentage of positive examples in this subset of the data (see Table 3).
Figure 10. Our model outperforms the SVC and KNN algorithms on the validation dataset. All of the classifiers achieved a fit that performed better than chance.
Figure 11. Slip detection by object: The network’s performance on the soup can or pudding box is noticeably better than for the other objects.
From the validation data, we selected a confidence threshold of 0.98 to represent the conservative, or high-precision, case and a threshold of 0.55 for balanced performance (Figure 12). Testing on our reserved data at these two thresholds resulted in a precision of 93.1% and a recall of 25.2% for the conservative case. For the balanced case, the precision was 64.0% and the recall was 77.2%.
Figure 12. Precision recall curves and receiver operating characteristic (ROC) curves for the validation data. We used these to identify two thresholds for testing, one that maximizes the F1 score, for balanced performance, and one that achieves a high precision.
The ability of the model to generalize to unseen objects was mixed. The model performed better than chance for all unseen objects except the Rubik’s cube (Figure 13). However, in all cases, the model performed worse when predicting on unseen objects than on those that were present in the training data (Figure 14). Despite this, the model was still able to achieve an F1 score of 82.0% when the unseen object was the tomato soup can.
Figure 13. Object generalization: Comparing the predictive ability of the original network to the object-excluded one. As expected, the performance decreases, but is still better than random chance for all objects except the Rubik’s Cube.
Figure 14. The maximum achievable F1 score decreases when validating the network on an object that the network has not been trained on. Performance remains high for the tomato soup can.
The classifier also performed better than chance for nearly all confidence thresholds when tested on the fruit picking dataset (Figure 15). This was true for all three two-finger pairs, as well as for the voting system. Due to the low slip percentage (3.3%) in the dataset, the model struggled with precision. When the slip examples were upsampled to achieve class parity, an F1 score of 76.7% was achieved.
Figure 15. For the fruit picking dataset, we used a different inertial measurement unit, introduced a unique noise profile from robot motion, and used fingers that are arranged radially rather than in a flat plane. Yet, the network’s performance is similar to its performance on the original dataset.
Our results show that for tendon-driven hands slip can be discriminated from environmental sliding using only fingertip IMUs. The overall performance when detecting slip using IMUs was similar to the performance reported in the literature for arrays of capacitive sensors (Heyneman and Cutkosky, 2016). One possible limitation to performance is that the sampling rate for the hand used in this study was only 40 Hz. This means that frequencies over 20 Hz could not be detected. It is also unclear how much of the predictive capability of the neural network is attributable to the correlation across sensor channels compared to the proprioceptive information captured by the gyroscope. Additional research is needed to determine the individual effects of vibration, fingertip deformation, and proprioception for robotic slip perception.
8 Conclusion
This paper supports the feasibility of fingertip-mounted IMUs, an increasingly common sensor configuration for compliant hands, to discriminate between slip and environmental sliding for such grippers. Our network was able to out-perform chance when generalizing to most unseen objects and when generalizing to an unseen dataset with a different gripper, object, and motion. However, the ability of the network to generalize to unseen objects was variable depending on the object in question. It is possible that the feature space for some sets of objects had a greater degree of overlap than for others. If this is the case, then feature engineering may be able to improve the generalizability.
Note also that in our primary dataset any amount of sliding between the hand and object is considered to be slip. In contrast, for many robotic manipulation tasks, only large, sustained, slip events that cause grasp loss are important. This is the type of slip labeled in our fruit-picking dataset. For such applications, the results presented here for the high-threshold case (precision = 93.1%, recall = 25.2%) may be sufficient for practical use.
Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: ”Slip Labels and Inertial Data from Compliant Hand (SLID-CH)” https://zenodo.org/records/15886336.
Author contributions
MC: Writing – original draft, Visualization, Software, Writing – review and editing, Data curation, Conceptualization, Investigation. PV: Writing – review and editing, Data curation. CG: Writing – review and editing, Supervision. JD: Project administration, Funding acquisition, Conceptualization, Writing – review and editing.
Funding
The authors declare that financial support was received for the research and/or publication of this article. This work was supported by the AI Research Institutes program supported by NSF and USDA-NIFA under the AI Institute: Agricultural AI for Transforming Workforce and Decision Support (AgAID) Award No. 2021-67021-35344.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The authors declare that no Generative AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Bergstra, J., Bardenet, R., Bengio, Y., and Kégl, B. (2011). Algorithms for hyper-parameter optimization. Adv. Neural Information Processing Systems 24.
Bergstra, J., Yamins, D., and Cox, D. (2013). “Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures,” in International conference on machine learning (Cambridge, MA: PMLR), 115–123.
Calli, B., Singh, A., Walsman, A., Srinivasa, S., Abbeel, P., and Dollar, A. M. (2015). “The ycb object and model set: towards common benchmarks for manipulation research,” in 2015 international conference on advanced robotics (ICAR) (IEEE), 510–517.
Dischinger, L. M., Cravetz, M., Dawes, J., Votzke, C., VanAtter, C., Johnston, M. L., et al. (2021). “Towards intelligent fruit picking with in-hand sensing,” in 2021 IEEE/RSJ int’l conf. on intelligent robots and systems (IROS).
Heyneman, B., and Cutkosky, M. R. (2016). Slip classification for dynamic tactile array sensors. Int. J. Robotics Res. 35, 404–421. doi:10.1177/0278364914564703
Howe, R. D., and Cutkosky, M. R. (1989). Sensing skin acceleration for slip and texture perception. Proc. 1989 Int. Conf. Robotics Automation, 145–150. doi:10.1109/robot.1989.99981
Ibrahim, R. A. (1994). Friction-induced vibration, chatter, squeal, and chaos—part i: mechanics of contact and friction. Appl. Mech. Rev. 47, 209–226. doi:10.1115/1.3111079
Kieliba, P., Veltink, P. H., Lisini Baldi, T., Prattichizzo, D., Santaera, G., Bicchi, A., et al. (2018). “Comparison of three hand pose reconstruction algorithms using inertial and magnetic measurement units,” in 2018 IEEE-RAS 18th international conference on humanoid robots (Humanoids), 1–9. doi:10.1109/HUMANOIDS.2018.8624929
Lin, S.-G., Chang, H.-T., and Chang, J.-Y. (2024). Imu-enabled robotic finger force estimation with modular soft actuators and virtual linkages. IEEE Trans. Magnetics 60, 1–5. doi:10.1109/TMAG.2024.3427630
Liu, S. Q., and Adelson, E. H. (2022). “Gelsight fin ray: incorporating tactile sensing into a soft compliant robotic gripper,” in 2022 IEEE 5th international conference on soft robotics (RoboSoft) (IEEE), 925–931.
Liu, Y., Zhang, J., Lou, Y., Zhang, B., Zhou, J., and Chen, J. (2024). Soft bionic gripper with tactile sensing and slip detection for damage-free grasping of fragile fruits and vegetables. Comput. Electron. Agric. 220, 108904. doi:10.1016/j.compag.2024.108904
Macefield, V. G., Häger-Ross, C., and Johansson, R. S. (1996). Control of grip force during restraint of an object held between finger and thumb: responses of cutaneous afferents from the digits. Exp. Brain Research 108, 155–171. doi:10.1007/bf00242913
Massalim, Y., and Kappassov, Z. (2019). “Array of accelerometers as a dynamic vibro-tactile sensing for assessing the slipping noise,” in 2019 IEEE/SICE international symposium on system integration (SII), 438–443. doi:10.1109/SII.2019.8700328
Narita, T., Nagakari, S., Conus, W., Tsuboi, T., and Nagasaka, K. (2020). Theoretical derivation and realization of adaptive grasping based on rotational incipient slip detection. Proc. 2020 Int. Conf. Robotics Automation, 531–537. doi:10.1109/icra40945.2020.9196615
Navas, E., Fernández, R., Sepúlveda, D., Armada, M., and Gonzalez-de Santos, P. (2021). Soft grippers for automatic crop harvesting: a review. Sensors 21, 2689. doi:10.3390/s21082689
Odhner, L. U., Jentoft, L. P., Claffee, M. R., Corson, N., Tenzer, Y., Ma, R. R., et al. (2014). A compliant, underactuated hand for robust manipulation. Int. J. Robotics Res. 33, 736–752. doi:10.1177/0278364913514466
Romano, J. M., Hsiao, K., Niemeyer, G., Chitta, S., and Kuchenbecker, K. J. (2011). Human-inspired robotic grasp control with tactile sensing. IEEE Trans. Robotics 27, 1067–1079. doi:10.1109/tro.2011.2162271
Romeo, R. A., and Zollo, L. (2020). Methods and sensors for slip detection in robotics: a survey. IEEE Access 8, 73027–73050. doi:10.1109/access.2020.2987849
Ruomin, S., Zhang, L., Tiemin, L., and Yao, J. (2021). Incipient slip detection method with vision-based tactile sensor based on distribution force and deformation. IEEE Sensors J. 21, 25973–25985. doi:10.1109/jsen.2021.3119060
Santaera, G., Luberto, E., Serio, A., Gabiccini, M., and Bicchi, A. (2015). “Low-cost, fast and accurate reconstruction of robotic and human postures via imu measurements,” in 2015 IEEE international conference on robotics and automation (ICRA), 2728–2735. doi:10.1109/ICRA.2015.7139569
Shintake, J., Cacucciolo, V., Floreano, D., and Shea, H. (2018). Soft robotic grippers. Adv. Materials 30, 1707035. doi:10.1002/adma.201707035
Tai, K., El-Sayed, A.-R., Shahriari, M., Biglarbegian, M., and Mahmud, S. (2016). State of the art robotic grippers and applications. Robotics 5, 11. doi:10.3390/robotics5020011
Velasquez, A., Swenson, N., Cravetz, M., Grimm, C., and Davidson, J. R. (2022). “Predicting fruit-pick success using a grasp classifier trained on a physical proxy,” in IEEE/RJS international conference on intelligent robots and systems, 9225–9231doi. doi:10.1109/iros47612.2022.9981716
Wang, X., Huang, B., Wang, R., Mo, J., and Ouyang, H. (2020). Friction-induced stick-slip vibration and its experimental validation. Mech. Syst. Signal Process. 142, 106705. doi:10.1016/j.ymssp.2020.106705
Yu, M., Fang, G., Yang, J., Guo, Y., and Wang, C. C. L. (2023). Spring-imu fusion based proprioception for feedback control of soft manipulators. IEEE/ASME Transactions Mechatronics. doi:10.48550/arxiv.2309.14279
Yuan, W., Dong, S., and Adelson, E. H. (2017). Gelsight: high-resolution robot tactile sensors for estimating geometry and force. Sensors 17, 2762. doi:10.3390/s17122762
Keywords: compliant hand, slip detection, inertial measurement unit, grasping, contact-rich manipulation
Citation: Cravetz M, Vyas P, Grimm C and Davidson JR (2025) Slip detection for compliant robotic hands using inertial signals and deep learning. Front. Robot. AI 12:1698591. doi: 10.3389/frobt.2025.1698591
Received: 03 September 2025; Accepted: 26 November 2025;
Published: 18 December 2025.
Edited by:
Alan Frank Thomas Winfield, University of the West of England, United KingdomReviewed by:
Yuri Gloumakov, University of Connecticut, United StatesJing Jin Shen, Nanjing University of Posts and Telecommunications, China
Copyright © 2025 Cravetz, Vyas, Grimm and Davidson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Miranda Cravetz, bWNyYXZldHouZ2VuQGdtYWlsLmNvbQ==
Purva Vyas