AUTHOR=Papanagiotou Dimitris , Senteri Gavriela , Manitsaris Sotiris 

TITLE=Egocentric Gesture Recognition Using 3D Convolutional Neural Networks for the Spatiotemporal Adaptation of Collaborative Robots

JOURNAL=Frontiers in Neurorobotics

VOLUME=Volume 15 - 2021

YEAR=2021

URL=https://www.frontiersin.org/journals/neurorobotics/articles/10.3389/fnbot.2021.703545

DOI=10.3389/fnbot.2021.703545

ISSN=1662-5218

ABSTRACT=Collaborative robots are currently deployed in professional environments, in collaboration with professional human operators, helping to strike the right balance between mechanization and manual intervention in manufacturing processes required by Industry 4.0. In order to support human operators, cobots need to be able to constantly perceive their activities and follow them. In this paper, the contribution of gesture recognition and pose estimation to the smooth introduction of cobots into an industrial production line is described, with a view to performing actions in parallel with the human operators and enabling interaction between them. For this to be achieved, a collaboration protocol that uses Machine Learning algorithms, wearables and sensors is established. The use-case of this work is concerned with LCD TV assembly on the production line of an appliance manufacturer. Both parts of the operation are currently performed manually. The introduction of a human-robot interaction system with spatiotemporal adaptive cobot behavior is presented, to which the first part of the above-mentioned operation is assigned, strengthening the production line. Gesture recognition, pose estimation, physical interaction and sonic notification create a multimodal human-robot interaction system. Five experiments were performed, in order to compare the different types of interaction possible. Physical interaction was achieved using force sensor of the cobot. Pose estimation through a skeleton-tracking algorithm provided the cobot with human pose information and made it spatially adjustable. Sonic notification was added in case of any unexpected incidents. A real-time gesture recognition module was used for the recognition and interpretation of the gestures that constitute the TV assembly routine to be performed by the cobot. Finally, a combination of all the modalities was investigated. Gesture recognition was implemented through a Deep Learning architecture consisting of Convolutional layers, trained in an egocentric view. This constitutes an added value in this work, as it affords the potential of recognizing gestures independently of the anthropometric characteristics and the background. Common metrics derived from the literature were used for the evaluation of the proposed system. The opinion of the human operator was measured through a questionnaire that concerned the various affective states of the operator during the collaboration.