Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Robot. AI

Sec. Computational Intelligence in Robotics

Volume 12 - 2025 | doi: 10.3389/frobt.2025.1621033

This article is part of the Research TopicSynergizing Large Language Models and Computational Intelligence for Advanced Robotic SystemsView all 3 articles

Large Language Model-Driven Natural Language Interaction Control Framework for Single-Operator Bimanual Teleoperation

Provisionally accepted
  • 1Lancaster University, Lancaster, United Kingdom
  • 2Tsinghua University, Beijing, China
  • 3Dalian Jiaotong University, Dalian, China
  • 4South China University of Technology, Guangdong, China
  • 5Shanghai Jiaotong University, Shanghai, China

The final, formatted version of the article will be published soon.

Bimanual teleoperation imposes cognitive and coordination demands on a single human operator tasked with simultaneously controlling two robotic arms. Although assigning each arm to a separate operator can distribute workload, it often leads to ambiguities in decision authority and degrades overall efficiency. To overcome these challenges, we propose a novel bimanual teleoperation large language model assistant (BTLA) framework, an intelligent copilot that augments a single operator's motor control capabilities. In particular, BTLA enables operators to directly control one robotic arm through conventional teleoperation while directing a second assistive arm via simple voice commands, and therefore commanding two robotic arms simultaneously. By integrating the GPT-3.5-turbo model, BTLA interprets contextual voice instructions and autonomously selects among six predefined manipulation skills, including realtime mirroring, trajectory following, and autonomous object grasping. Experimental evaluations in bimanual object manipulation tasks demonstrate that BTLA increased task coverage by 76.1 % and success rate by 240.8% relative to solo teleoperation, and outperformed dyadic control with a 19.4% gain in coverage and a 69.9% gain in success. Furthermore, NASA Task Load Index (NASA-TLX) assessments revealed a 38-52% reduction in operator mental workload, and 85% of participants rated the voice-based interaction as "natural" and "highly effective.

Keywords: Human-robot collaboration, teleoperation, Bimanual manipulation, embodied AI, large language model (LLM)

Received: 30 Apr 2025; Accepted: 30 Jun 2025.

Copyright: © 2025 Fei, Xue, Lin, Du, Guo and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Ziwei Wang, Lancaster University, Lancaster, United Kingdom

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.