Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Robot. AI

Sec. Industrial Robotics and Automation

This article is part of the Research TopicFlexible and Sustainable Robotic Process Automation for High-Mix Low-Volume ProductionView all articles

Intent-Driven LLM Ensemble Planning for Flexible Multi-Robot Manipulation

Provisionally accepted
  • 1University of Birmingham, Birmingham, United Kingdom
  • 2Aston University, Birmingham, United Kingdom
  • 3Queen Mary University of London, London, United Kingdom

The final, formatted version of the article will be published soon.

This paper addresses intent‑driven task planning, for complex multi-action manipulation sequences in heterogeneous multi‑robot cells. Given a perception back-end that outputs a structured object‑level scene description, and a human operator’s natural‑language intent, we generate a precedence‑consistent object‑level robot-action sequence, which can then be executed by passing each such action to a lower-level motion planning module. The pipeline integrates: (i) perception-to-text scene encoding, (ii) an ensemble of large language models (LLMs) that generate candidate action sequences based on the operator's intent, (iii) an LLM-based verifier that enforces formatting and precedence constraints, and (iv) a deterministic consistency filter that rejects hallucinated objects. The pipeline is evaluated on an example task in which two robot arms work collaboratively to dismantle an Electric Vehicle battery for recycling applications. A variety of components must be grasped and removed in specific sequences, determined by human instructions and/or by task-order feasibility decisions made by the autonomous system. On 200 real scenes with 600 operator prompts across five component classes, we used metrics of full-sequence correctness and next-task correctness to evaluate and compare five LLM-based planners (including ablation analyses of pipeline components). We also evaluated the LLM-based human interface in terms of time to execution and NASA TLX with human participant experiments. On 200 real scenes and 600 prompts, full‑sequence correctness improves from 0.761 (single LLM) to 0.824 (6‑LLM + verifier + deterministic filter), and next‑object correctness improves from 0.866 to 0.894. Results in our case study indicate that our ensemble-with-verification approach reliably maps operator intent to safe multi-robot plans while maintaining low user effort.

Keywords: human-robot interaction, intent recognition, Large language models, Multi-Robot Disassembly, Task Planning

Received: 17 Oct 2025; Accepted: 19 Jan 2026.

Copyright: © 2026 Erdogan, Contreras, Rastegarpanah, Chiou and Stolkin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Alireza Rastegarpanah

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.