Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Robot. AI

Sec. Computational Intelligence in Robotics

This article is part of the Research TopicAdvanced Sensing, Learning and Control for Effective Human-Robot InteractionView all 3 articles

Adaptive Querying for Reward Learning from Human Feedback

Provisionally accepted
Yashwanthi  AnandYashwanthi Anand*Nnamdi  NwagwuNnamdi NwagwuKevin  SabbeKevin SabbeNaomi  Talya FitterNaomi Talya FitterSandhya  SaisubramanianSandhya Saisubramanian
  • Oregon State University, Corvallis, United States

The final, formatted version of the article will be published soon.

Learning from human feedback is a popular approach to train robots to adapt to user preferences and improve safety. Existing approaches typically consider a single querying (interaction) format when seeking human feedback and do not leverage multiple modes of user interaction with a robot. We examine how to learn a penalty function associated with unsafe behaviors using multiple forms of human feedback, by optimizing both the query state and feedback format. Our proposed adaptive feedback selection is an iterative, two-phase approach which first selects critical states for querying, and then uses information gain to select a feedback format for querying across the sampled critical states. The feedback format selection also accounts for the cost and probability of receiving feedback in a certain format. Our experiments in simulation demonstrate the sample efficiency of our approach in learning to avoid undesirable behaviors. The results of our user study with a physical robot highlight the practicality and effectiveness of adaptive feedback selection in seeking informative, user-aligned feedback that accelerate learning. Experiment videos, code and appendices are found on our website: https://tinyurl.com/AFS-learning

Keywords: information gain, Interactive Imitation Learning, Learning from human feedback, Learning from multiple formats, Robotlearning

Received: 28 Oct 2025; Accepted: 15 Dec 2025.

Copyright: © 2025 Anand, Nwagwu, Sabbe, Fitter and Saisubramanian. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Yashwanthi Anand

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.