ORIGINAL RESEARCH article

Front. Robot. AI

Sec. Computational Intelligence in Robotics

Volume 12 - 2025 | doi: 10.3389/frobt.2025.1576348

This article is part of the Research TopicSynergizing Large Language Models and Computational Intelligence for Advanced Robotic SystemsView all articles

Enhancing Supermarket Robot Interaction: An Equitable Multi-Level LLM Conversational Interface for Handling Diverse Customer Intents

Provisionally accepted
  • Delft University of Technology, Delft, Netherlands

The final, formatted version of the article will be published soon.

This paper presents the design and evaluation of a comprehensive system to develop voicebased interfaces to support users in supermarkets. These interfaces enable shoppers to convey their needs through both generic and specific queries. Although customisable state-of-theart systems like GPTs from OpenAI are easily accessible and adaptable, featuring low-code deployment with options for functional integration, they still face challenges such as increased response times and limitations in strategic control for tailored use cases and cost optimization.Motivated by the goal of crafting equitable and efficient conversational agents with a touch of personalisation, this study advances on two fronts: 1) a comparative analysis of four popular off-the-shelf speech recognition technologies to identify the most accurate model for different genders (male/female) and languages (English/Dutch) and 2) the development and evaluation of a novel multi-LLM supermarket chatbot framework, comparing its performance with a specialized GPT model powered by the GPT-4 Turbo, using the Artificial Social Agent Questionnaire (ASAQ) and qualitative participant feedback. Our findings reveal that OpenAI's Whisper leads in speech recognition accuracy between genders and languages and that our proposed multi-LLM chatbot architecture, which outperformed the benchmarked GPT model in performance, user satisfaction, user-agent partnership, and self-image enhancement, achieved statistical significance in these four key areas out of the 13 evaluated aspects that all showed improvements. The paper concludes with a simple method for supermarket robot navigation by mapping the final chatbot response to the correct shelf numbers to which the robot can plan sequential visits. Later, this enables the effective use of low-level perception, motion planning, and control capabilities for product retrieval and collection. We hope that this work encourages more efforts to use multiple specialized smaller models instead of always relying on a single powerful model.

Keywords: Voice interface, Robotics, Large language models, speech recognition, Questionnaires

Received: 13 Feb 2025; Accepted: 16 Apr 2025.

Copyright: © 2025 Nandkumar and Peternel. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Chandran Nandkumar, Delft University of Technology, Delft, Netherlands
Luka Peternel, Delft University of Technology, Delft, Netherlands

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.