Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Robot. AI

Sec. Industrial Robotics and Automation

Volume 12 - 2025 | doi: 10.3389/frobt.2025.1660244

This article is part of the Research TopicInnovations in Industry 4.0: Advancing Mobility and Manipulation in RoboticsView all 8 articles

Visuo-Tactile Feedback Policies for Terminal Assembly Facilitated by Reinforcement Learning

Provisionally accepted
Yuchao  LiYuchao Li1Ziqi  JinZiqi Jin1Jin  LiuJin Liu2Daolin  MaDaolin Ma1*
  • 1School of Ocean and Civil Engineering, Shanghai Jiao Tong University, Shanghai, China
  • 2Shanghai Jiao Tong University School of Mechanical Engineering, Shanghai, China

The final, formatted version of the article will be published soon.

Industrial terminal assembly tasks are often repetitive and involve handling components with tight tolerances that are susceptible to damage. Learning an effective terminal assembly policy in real-world is challenging, as collisions between parts and the environment can lead to slippage or part breakage. In this paper, we propose a safe reinforcement learning approach to develop a visuo-tactile assembly policy that is robust to variations in grasp poses. Our method minimizes collisions between the terminal head and terminal base by decomposing the assembly task into three distinct phases. In the first grasp phase,a vision-guided model is trained to pick the terminal head from an initial bin. In the second align phase, a tactile-based grasp pose estimation model is employed to align the terminal head with the terminal base. In the final assembly phase, a visuo-tactile policy is learned to precisely insert the terminal head into the terminal base. To ensure safe training, the robot leverages human demonstrations and interventions. Experimental results on PLC terminal assembly demonstrate that the proposed method achieves 100% successful insertions across 100 different initial end-effector and grasp poses, while imitation learning and online-RL policy yield only 9% and 0%.

Keywords: Visual Perception, tactile sensing, Multi-modal fusion, Terminal Assembly, reinforcement learning

Received: 05 Jul 2025; Accepted: 07 Oct 2025.

Copyright: © 2025 Li, Jin, Liu and Ma. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Daolin Ma, daolinma@sjtu.edu.cn

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.