ORIGINAL RESEARCH article
Front. Robot. AI
Sec. Industrial Robotics and Automation
This article is part of the Research TopicFlexible and Sustainable Robotic Process Automation for High-Mix Low-Volume ProductionView all articles
Intent-Driven LLM Ensemble Planning for Flexible Multi-Robot Manipulation
Provisionally accepted- 1University of Birmingham, Birmingham, United Kingdom
- 2Aston University, Birmingham, United Kingdom
- 3Queen Mary University of London, London, United Kingdom
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
This paper addresses intent‑driven task planning, for complex multi-action manipulation sequences in heterogeneous multi‑robot cells. Given a perception back-end that outputs a structured object‑level scene description, and a human operator’s natural‑language intent, we generate a precedence‑consistent object‑level robot-action sequence, which can then be executed by passing each such action to a lower-level motion planning module. The pipeline integrates: (i) perception-to-text scene encoding, (ii) an ensemble of large language models (LLMs) that generate candidate action sequences based on the operator's intent, (iii) an LLM-based verifier that enforces formatting and precedence constraints, and (iv) a deterministic consistency filter that rejects hallucinated objects. The pipeline is evaluated on an example task in which two robot arms work collaboratively to dismantle an Electric Vehicle battery for recycling applications. A variety of components must be grasped and removed in specific sequences, determined by human instructions and/or by task-order feasibility decisions made by the autonomous system. On 200 real scenes with 600 operator prompts across five component classes, we used metrics of full-sequence correctness and next-task correctness to evaluate and compare five LLM-based planners (including ablation analyses of pipeline components). We also evaluated the LLM-based human interface in terms of time to execution and NASA TLX with human participant experiments. On 200 real scenes and 600 prompts, full‑sequence correctness improves from 0.761 (single LLM) to 0.824 (6‑LLM + verifier + deterministic filter), and next‑object correctness improves from 0.866 to 0.894. Results in our case study indicate that our ensemble-with-verification approach reliably maps operator intent to safe multi-robot plans while maintaining low user effort.
Keywords: human-robot interaction, intent recognition, Large language models, Multi-Robot Disassembly, Task Planning
Received: 17 Oct 2025; Accepted: 19 Jan 2026.
Copyright: © 2026 Erdogan, Contreras, Rastegarpanah, Chiou and Stolkin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Alireza Rastegarpanah
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.