ORIGINAL RESEARCH article
Front. Robot. AI
Sec. Industrial Robotics and Automation
Volume 12 - 2025 | doi: 10.3389/frobt.2025.1660244
This article is part of the Research TopicInnovations in Industry 4.0: Advancing Mobility and Manipulation in RoboticsView all 8 articles
Visuo-Tactile Feedback Policies for Terminal Assembly Facilitated by Reinforcement Learning
Provisionally accepted- 1School of Ocean and Civil Engineering, Shanghai Jiao Tong University, Shanghai, China
- 2Shanghai Jiao Tong University School of Mechanical Engineering, Shanghai, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Industrial terminal assembly tasks are often repetitive and involve handling components with tight tolerances that are susceptible to damage. Learning an effective terminal assembly policy in real-world is challenging, as collisions between parts and the environment can lead to slippage or part breakage. In this paper, we propose a safe reinforcement learning approach to develop a visuo-tactile assembly policy that is robust to variations in grasp poses. Our method minimizes collisions between the terminal head and terminal base by decomposing the assembly task into three distinct phases. In the first grasp phase,a vision-guided model is trained to pick the terminal head from an initial bin. In the second align phase, a tactile-based grasp pose estimation model is employed to align the terminal head with the terminal base. In the final assembly phase, a visuo-tactile policy is learned to precisely insert the terminal head into the terminal base. To ensure safe training, the robot leverages human demonstrations and interventions. Experimental results on PLC terminal assembly demonstrate that the proposed method achieves 100% successful insertions across 100 different initial end-effector and grasp poses, while imitation learning and online-RL policy yield only 9% and 0%.
Keywords: Visual Perception, tactile sensing, Multi-modal fusion, Terminal Assembly, reinforcement learning
Received: 05 Jul 2025; Accepted: 07 Oct 2025.
Copyright: © 2025 Li, Jin, Liu and Ma. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Daolin Ma, daolinma@sjtu.edu.cn
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.