- Oregon State University, Corvallis, OR, United States
Learning from human feedback is a popular approach to train robots to adapt to user preferences and improve safety. Existing approaches typically consider a single querying (interaction) format when seeking human feedback and do not leverage multiple modes of user interaction with a robot. We examine how to learn a penalty function associated with unsafe behaviors using multiple forms of human feedback, by optimizing both the query state and feedback format. Our proposed adaptive feedback selection is an iterative, two-phase approach which first selects critical states for querying, and then uses information gain to select a feedback format for querying across the sampled critical states. The feedback format selection also accounts for the cost and probability of receiving feedback in a certain format. Our experiments in simulation demonstrate the sample efficiency of our approach in learning to avoid undesirable behaviors. The results of our user study with a physical robot highlight the practicality and effectiveness of adaptive feedback selection in seeking informative, user-aligned feedback that accelerate learning. Experiment videos, code and supplementary materials are found on our website: https://tinyurl.com/AFS-learning.
1 Introduction
A key factor affecting an autonomous agent’s behavior is its reward function. Due to the complexity of real-world environments and the practical challenges in reward design, agents often operate with incomplete reward functions corresponding to underspecified objectives, which can lead to unintended and undesirable behaviors such as negative side effects (NSEs) (Amodei et al., 2016; Saisubramanian et al., 2021a; Srivastava et al., 2023). For example, a robot optimizing the distance to transport an object to the goal, may damage items along the way if its reward function does not model the undesirability of colliding into other objects in the way (Figure 1).
Figure 1. An illustration of adaptive feedback selection. The robot arm learns to move the blue object to the white bin, without colliding with other objects in the way, by querying the human in different format across the state space.
Human feedback offers a natural way to provide the missing knowledge, and several prior works have examined learning from various forms of human feedback to improve robot performance, including avoiding side effects (Cui and Niekum, 2018; Cui et al., 2021b; Lakkaraju et al., 2017; Ng and Russell, 2000; Saran et al., 2021; Zhang et al., 2020). In many real-world settings, the human can provide feedback in many forms, ranging from binary signals indicating action approval to correcting robot actions, each varying in the granularity of information revealed to the robot and the human effort required to provide it. For instance, a person supervising a household robot may occasionally be willing to provide detailed corrections when the robot encounters a fragile vase but may only want to give quick binary approvals during a routine motion. Ignoring this variability either limits what the robot can learn or burdens the user. To efficiently balance the trade-off between seeking feedback in a format that accelerates robot learning and reducing human effort involved, it is beneficial to seek detailed feedback sparingly in certain states and complement it with feedback types that require less human effort in other states. Such an approach could also reduce the sampling biases associated with learning from any one format, thereby improving learning performance (Saisubramanian et al., 2022). In fact, a recent study indicates that users are generally willing to engage with the robot in more than one feedback format (Saisubramanian et al., 2021b). However, existing approaches rarely exploit this flexibility, and do not support gathering feedback in different formats in different regions of the state space (Cui et al., 2021a; Settles, 1995).
These practical considerations motivate the core question of this paper: “How can a robot identify when to query and in what format, while accounting for the cost and availability of different forms of feedback?” We present a framework for adaptive feedback selection (AFS) that enables a robot to seek feedback in multiple formats in its learning phase, such that its information gain is maximized. Rather than treating all states and feedback formats uniformly, AFS prioritizes human feedback in states where feedback is most valuable and chooses feedback types based on their expected cost and information gain. This design reduces user effort, accommodates different levels of feedback granularity, and focuses on state where learning improves safety. In the interest of clarity, the rest of this paper grounds the discussion of AFS as an approach for robots to learn to avoid negative side effects (NSEs) of their actions. The NSEs refer to unintended and undesirable outcomes that arise as the agent performs its assigned task. In object delivery example in Figure 1, the robot may inadvertently collide with other objects on the table, producing NSEs. Focusing on NSEs provides a well-defined and measurable setting–quantified by the number of NSE occurrences–to evaluate how AFS improves an agent’s learning efficiency and safety. However, note that AFS is a general technique that can be applied broadly to learn about various forms of undesirable behavior.
Minimizing NSEs using AFS involves four iterative steps (Figure 4): (1) states are partitioned into clusters, with a cluster weight proportional to the number of NSEs discovered in it; (2) a set of critical states—states where human feedback is crucial for learning an association of state features and NSEs, i.e., a predictive model of NSE severity, is formed by sampling from each cluster based on its weight; (3) a feedback format that maximizes the information gain in critical states is identified, while accounting for the cost and uncertainty in receiving a feedback, using the human feedback preference model; and (4) cluster weights and information gain are updated, and a new set of critical states are sampled to learn about NSEs, until the querying budget expires. The learned NSE information is mapped to a penalty function and augmented to the robot’s model to compute an NSE-minimizing policy to complete its task.
We evaluate AFS in both simulation and using a user study where participants interact with a robot arm. First, we evaluate the approach in three simulated proof-of-concept settings with simulated human feedback. Second, we conduct a pilot study where 12 human participants interact with and provide feedback to the agent in a simulated gridworld domain. Finally, we evaluate using a Kinova Gen3 7DoF arm and 30 human participants. Besides the performance and sample efficiency, our experiments also provide insights into how the querying process can influence user trust. Together, these complementary studies demonstrate both the practicality and effectiveness of AFS.
2 Background and related work
2.1 Markov Decision Processes (MDPs)
The MDPs are a popular framework to model sequential decision making problems. An MDP is defined by the tuple
2.2 Learning from human feedback
Learning from human feedback is a popular approach to train agents when reward functions are unavailable or incomplete (Abbeel and Ng, 2004; Ng and Russell, 2000; Ross et al., 2011; Najar and Chetouani, 2021), including to improve safety (Brown et al., 2020b; 2018; Hadfield Menell et al., 2017; Ramakrishnan et al., 2020; Zhang et al., 2020; Saisubramanian et al., 2021a; Hassan et al., 2025). Feedback can take various forms such as demonstrations (Ramachandran and Amir, 2007; Saisubramanian et al., 2021a; Seo and Unhelkar, 2024; Zha et al., 2024), corrections (Cui et al., 2023; Bärmann et al., 2024), critiques (Cui and Niekum, 2018; Tarakli et al., 2024), ranking trajectories (Brown et al., 2020a; Xue et al., 2024; Feng et al., 2025), natural language instructions (Lou et al., 2024; Yang Y. et al., 2024; Hassan et al., 2025) or may be implicit in the form of facial expressions and gestures (Cui et al., 2021b; Strokina et al., 2022; Candon et al., 2023).
While the existing approaches for learning from feedback have shown success, they typically assume that a single feedback type is used to teach the agent. This assumption limits learning efficiency and adaptability. Some efforts combine demonstrations with preferences (Bıyık et al., 2022; Ibarz et al., 2018), showing that utilizing more than one format accelerates learning. Extending this idea, recent works integrate richer modalities such as language and vision with demonstrations. Yang Z. et al. (2024) learn reward function from comparative language feedback, while Sontakke et al. (2023) show that a single demonstration or natural language description can help define a proxy reward when used along with a vision-language models (VLM) that is pretrained on a large amount of out-of-domain video demonstrations and language pairs. Kim et al. (2023) use multimodal embeddings of visual observations and natural language descriptions to compute alignment-based rewards. A recent study even emphasizes that combining multiple feedback modalities can further enhance learning outcomes (Beierling et al., 2025). Together, these works highlight that combining complementary feedback formats help advance reward learning beyond using a fixed feedback format. Building on this insight, our approach uses multiple forms of human feedback for learning.
Learning from human feedback has also been used for modeling variations in human behavior. Huang et al. (2024) model the heterogeneous behaviors of human, capturing differences in feedback frequency, delay, strictness, and bias to improve the robustness during the learning process, as optimal behaviors vary across users. Along the same line, the reward learning approach proposed by Ghosal et al. (2023), selects a single feedback format based on the user ability to provide feedback in that format, resulting in an interaction that is tailored to a user’s skill level. Collectively, these works reveal a shift towards adaptive and user-aware querying mechanisms that improves reward inference and learning efficiency, motivating our approach to dynamically select both when to query and in what feedback format.
3 Problem formulation
Setting: Consider a robot operating in a discrete environment modeled as a Markov Decision Process (MDP), using its acquired model
Assumption 1. Similar to (Saisubramanian et al., 2021a), we assume that the agent’s model
Since the model is incomplete in ways unrelated to the primary objective, executing the primary policy produces negative side effects (NSEs) that are difficult to identify at design time. Following (Saisubramanian et al., 2021a), we define NSEs as immediate, undesired, unmodeled effects of a robot’s actions on the environment. We focus on settings where the robot has no prior knowledge about the NSEs of its actions or the underlying true NSE penalty function
We target settings where the human can provide feedback in multiple ways and the robot can seek feedback in a specific format such as approval or corrections. This represents a significant shift from traditional active learning methods, which typically gather feedback only in a single format (Ramakrishnan et al., 2020; Saisubramanian et al., 2021a; Saran et al., 2021). Using the learned
Running Example: We illustrate the problem using a simple object delivery task using a Kinova Gen3 7DoF arm shown in Figure 1. The robot optimizes delivering the blue block to the white bin, by taking the shortest path. However, passing through states with a cardboard box or a glass bowl constitutes an NSE. Since the robot has no prior knowledge about NSEs of its actions, it may inadvertently navigate through these states causing NSEs.
Human’s Feedback Preference Model: The feedback format selection must account for the cost and human preferences in providing feedback in a certain format. The user’s feedback preference model is denoted by
•
•
•
This work assumes the robot has access to the user’s feedback preference model
Assumption 2. Human feedback is immediate and accurate, when available.
Below, we describe the various feedback formats considered in this paper, and how the data from these formats are mapped to NSE severity labels.
3.1 Feedback formats studied
The agent learns an association between state-action pairs and NSE severity, based on the human feedback provided in response to agent queries. The NSE categories we consider in this work are
Approval (App): The robot randomly selects
Annotated Approval (Ann. App): An extension of Approval, where the human specifies the NSE severity (or category) for each disapproved action in the critical states.
Corrections (Corr): The robot performs a trajectory of its primary policy in the critical states, under human supervision. If the robot’s action is unacceptable, then the human intervenes with an acceptable action in these states. If all actions in a state lead to NSE, the human specifies an action with the least NSE. When interrupted, the robot assumes all actions except the correction are unacceptable in that state.
Annotated Corrections (Ann. Corr): An extension of Corrections, where the human specifies the severity of NSEs caused by the robot’s unacceptable action in critical states.
Rank: The robot randomly selects
Demo-Action Mismatch (DAM): The human demonstrates a safe action in each critical state, which the robot compares with its policy. All mismatched robot’s actions are labeled as unacceptable. Matched actions are labeled as acceptable.
Mapping feedback data to NSE severity labels: We use
Figure 2. Visualization of reward learned using different feedback types. (Row 1) Black arrows indicate queries, and feedback is in speech bubbles.
4 Adaptive feedback selection
Given an agent’s decision making model
Formalizing NSE Model Learning: Let
Figure 3 shows an example of
Figure 3. Illustration of
At
Each predicted label is then mapped to a penalty value to form the learned penalty function,
In this learning setup, minimizing NSEs using AFS involves four iterative steps (Figure 4). In each learning iteration, AFS identifies (1) which states are most critical for querying (Section 4.1), and (2) which feedback format maximizes the expected information gain at the critical states, while accounting for user feedback preferences and effort involved (Section 4.2). The information gain associated with a feedback quantifies the effect of a feedback in improving the agent’s understanding of the underlying reward function, and is measured using Kullback-Leibler (KL) Divergence (Ghosal et al., 2023; Tien et al., 2023). At the end of each iteration, the cluster weights and information gain are updated, and a new set of critical states are sampled to learn about NSEs, until the querying budget expires or the KL-divergence is below a problem-specific, pre-defined threshold.
Figure 4. Solution approach overview. The critical states
4.1 Critical states selection
When the budget for querying a human is limited, it is useful to query in states with a high learning gap measured as the KL-divergence between the agent’s knowledge of NSE severity and the true NSE severity given the feedback data collected so far. States with a high learning gap are called critical states
Since
In order to select critical states for querying, we compute the KL divergence between
1. Clustering states: Since NSEs are typically correlated with specific state features and do not occur at random, we cluster the states
2. Estimating information gain: We define the information gain of sampling from a cluster
where
3. Sampling critical states: At each learning iteration
4.2 Feedback format selection
To query in the critical states,
where,
where
Algorithm 2 outlines our feedback format selection approach. Since the agent has no prior knowledge of how the human categorizes NSE for each state-action pairs, the labeling function
Figure 5 illustrates the critical states and the most informative feedback formats selected at each iteration in the object delivery task using AFS, demonstrating that feedback utility changes over time, based on the robot’s current knowledge.
Figure 5. Feedback utility of each format across iterations. Numbers mark when a state was identified as critical, and circle colors denote the chosen feedback format.
4.3 Stopping criteria
Besides guiding the selection of critical states and feedback format, the KL-divergence also serves as an indicator of when to stop querying. The querying phase can be terminated when
5 Experiments in simulation
We first evaluate AFS on three simulated domains (Figure 6). Human feedback is simulated by modeling an oracle that selects safer actions with higher probability using a softmax action selection (Ghosal et al., 2023; Jeon et al., 2020): the probability of choosing an action
Figure 6. Illustrations of evaluation domains. Red box denotes the agent and the goal location is in green. (a) Navigation: Unavoidable NSE. (b) Vase: Unavoidable NSE. (c) Safety-gym Push.
Baselines (i) Naive Agent: The agent naively executes its primary policy without learning about NSEs, providing an upper bound on the NSE penalty incurred. (ii) Oracle: The agent has complete knowledge about
Domains, Metrics and Feedback Formats: We evaluate the performance of various techniques on three domains in simulation (Figure 6): outdoor navigation, vase and safety-gym’s push. We optimize costs (negations of rewards) and compare techniques using average NSE penalty and average cost to goal, averaged over 100 trials. For navigation, vase and push, we simulate human feedback. The cost for
Navigation: In this ROS-based city environment, the robot optimizes the shortest path to the goal location. A state is represented as
Vase: In this domain, the robot must quickly reach the goal, while minimizing breaking a vase as a side effect (Krakovna et al., 2020). A state is represented as
Push: In this safety-gymnasium domain, the robot aims to push a box quickly to a goal state (Ji et al., 2023). Pushing a box on a hazard zone (blue circles) produces NSEs. We modify the domain such that in addition to the existing actions, the agent can also wrap the box that costs
5.1 Results and discussion
Effect of learning using AFS: We first examine the benefit of querying using AFS, by comparing the resulting average NSE penalties and the cost for task completion, across domains and query budget. Figure 7 shows the average NSE penalties when operating based on an NSE model learned using different querying approaches. Clusters for critical state selection were generated using KMeans clustering algorithm with
Figure 7. Average penalty incurred when querying with different feedback selection techniques. (a) Navigation: Unavoidable NSE. (b) Vase: Unavoidable NSE. (c) Safety-gym Push.
There is a trade-off between optimizing task completion and mitigating NSEs, especially when NSEs are unavoidable. While some techniques are better at mitigating NSEs, they significantly impact task performance. Table 1 shows the average cost for task completion at
Figure 8 shows the average penalty when AFS uses KL-divergence (KLD) as the stopping criteria, compared to querying with budget
Figure 8. Average penalty incurred when learning with AFS using querying budget
6 In-person user study with a physical robot arm
We conducted an in-person study with a Kinova Gen3 7DoF arm (Kinova, 2025) tasked with delivering two objects—an orange toy and a white box—across a workspace containing items of varying fragility (Figure 9). This setup involves users providing both interface-based and kinesthestic feedback to the robot. The study was approved by Oregon State University IRB. Participants were compensated with a
Figure 9. Task setup for the human subject study. (a) Physical setup of the task for human subjects study; (b) Replication of the physical setup using PyBullet. A dialog box corresponding to the current feedback format is shown for every query.
This user study had three goals: (1) to measure our approach’s effectiveness in reducing NSEs for a real-world task, (2) to understand how users perceive the adaptivity, workload and competence of the robot operating in the AFS framework, and (3) to evaluate the extent to which AFS captures user preferences in practice, while ensuring maximum information gain during the learning process.
6.1 Methods
6.1.1 Participants
We conducted a pilot study in simulation to inform our overall design, the details of which are discussed under Section 2 in the Supplementary Material. We conducted another pilot study with
6.1.2 Robotic system setup
The Kinova Gen3 arm was equipped with a joint space compliant controller which allowed participants to physically move the joints of the arm through space with gravity compensation when needed. Additionally, a task-space planner allowed for navigation to discrete grid positions for both feedback queries and policy execution (Kinova, 2025). Figure 9a shows the physical workspace and the two delivery objects, while Figure 9b shows the corresponding PyBullet simulation used for visualization during GUI-based feedback. A dialog box was displayed to prompt the participant whenever feedback was queried1.
6.1.3 Interaction premise
The interaction simulated an assistive robot delivering objects to their designated bins. Specifically, the task required the Kinova arm to deliver an orange plush toy and a rigid white box to their respective bins while avoiding collision with surrounding obstacles of different fragility. Collisions with fragile obstacles (e.g., a glass vase) during delivery of the plush toy were considered a mild NSE. Collisions involving the white rigid box were severe NSEs if with a fragile object and were mild NSEs if with a non-fragile object. All other scenarios were considered safe. The workspace was discretized into a grid of cells marked with tape on the tabletop and mirrored in the GUI. Each cell represented a state corresponding to possible end-effector position.
6.1.4 Study design
The robot’s state space was discretized and represented as
Participants interacted with the robot through four feedback formats,
1. Approval: The robot executed a single action in simulation, and participants indicated whether it was safe by selecting “yes” or “no” in the GUI.
2. Correction: The robot first executes action prescribed by its policy in simulation. If the action in simulation is deemed unsafe by the participant, the robot in the physical setup moves to the queried location. Participants then correct the robot by physically moving the robot arm to demonstrate a safe alternative action.
3. Demo-Action Mismatch: The robot first physically moved its arm to a specific end-effector position in the workspace. Participants then provided feedback by guiding the arm to a safe position, thereby demonstrating the safe action. The robot compares the action given by its policy to the demonstrated action. If the robot’s action and the demonstrated actions do not match, then the robot’s action is considered unsafe.
4. Ranking: Simulation clips of two actions selected at random in a given state were presented in GUI. Participants compared the two candidate actions and selected which was safer. If both actions were judged equally safe or unsafe, either option could be chosen.
Each participant experienced four learning conditions in a within-subjects, counterbalanced design:
1. The baseline RI approach proposed in Ghosal et al. (2023),
2. AFS with random
3. AFS with a fixed feedback format (DAM) for querying, consistent with prior works that rely primarily on demonstrations, and
4. The proposed AFS approach, where both the feedback format and the critical states are selected to maximize information gain.
Each condition is a distinct feedback query selection strategy controlling how the robot queried participants during learning. These conditions are the independent variables. The dependent measures include NSE occurrences, their severity, perceived workload, trust, competence and user alignment.
6.1.5 Hypotheses
We test the following hypotheses in the in-person study. These hypotheses were derived from trends observed in the experiments and human subjects study in simulation (Section 5 and Section 2 in the Supplementary Materials).
H1: Robots learning using AFS will have fewer NSEs in comparison to the baselines.
This hypothesis is derived from the results of our experiments on simulated domains (Figure 7) where AFS consistently reduced NSEs while completing the assigned task. We hypothesize that this trend extends to physical human-robot interactions.
H2: AFS will achieve comparable or better performance compared to the baselines, with a lower perceived workload for the users.
The results on simulated domains (Figure 8) show that AFS achieved better or comparable performance to the baselines, using fewer feedback queries. While the in-person user study requires relatively greater physical and cognitive effort, we expect the advantage of the sample efficiency to persist and investigate whether it translates to reduced perceived workload.
H3: Participants will report AFS as more trustworthy, competent, and aligned with user expectations, in comparison to the baselines.
In the human subjects simulation study (Supplementary Table S2), participants reported that AFS selected intelligent queries, targeted critical states, and improved the agent’s performance, reflecting indicators of trust, competence and user alignment. We hypothesize that this trend extends to physical settings as well.
Hypotheses H1 and H2 explore trends identified in simulation and are therefore confirmatory. Hypothesis H3 builds on the perception measures used in the human subjects study in simulation, and is hence treated as an extended confirmatory hypothesis.
6.1.6 Procedure
Each study session lasted approximately 1 hour and followed three phases.
6.1.6.1 Training
Participants were first introduced to the task objective, workspace, and the four feedback formats. For each format, they provided feedback on four sample queries to practice both GUI-based and kinesthetic interactions. After the completing each format, the participants rated the following: (i) probability of responding to a query in that format,
6.1.6.2 Main experience
Following training, participants completed the four learning conditions corresponding to different approaches under evaluation. In each condition, the participants provided feedback to train the robot to avoid collision while performing the object-delivery task. Depending on the feedback format selected by the querying strategy, participants either evaluated short simulation clips on the GUI or physically guided the robotic arm. At the end of each condition, the robot executed its learned policy based on its learning under that condition. The participants then observed its performance and completed a brief post-condition questionnaire assessing workload, trust, perceived competence, and user-alignment.
6.1.6.3 Closing
At the end of the study, participants compared the four learning approaches in terms of trade-offs between learning speed and safety. Participants reported their preferences on providing feedback through multiple formats versus relying on a single feedback format. These responses offered qualitative insight into AFS’s practicality and user acceptance.
6.1.7 Measures
We collected both quantitative and qualitative measures. The quantitative measure captured task-level performance through the frequency and the severity of NSEs (mild and severe). Qualitative measures captured participants’ perceptions of the following.
1. Workload: Participants’ perceived workload across the feedback formats and learning conditions were measured using the NASA Task Load Index (NASA TLX) (Hart and Staveland, 1988). The questionnaire scales were transformed to seven-point subscales ranging from “Very Low” (1) to “Very High” (7). Responses were collected during the training phase and after each condition in the main experience phase.
2. Robot Attributes: Perceived robot attributes, like competence, warmth and discomfort, were measured using the nine-point Robotic Social Attributes Scale (RoSAS) (Carpinella et al., 2017), ranging from “Strongly Disagree” (1) to “Strongly Agree” (9). Participants completed this questionnaire after each learning condition.
3. Trust: A custom 10-point trust scale
4. User Alignment: Participants’ perception of user alignment was assessed using a custom seven-point Likert scale ranging from “Strongly Disagree” (1) to “Strong Agree” (7). Participants rated (i) how well the critical states queried by the robot aligned with their own assessment of which states were important for learning, and (ii) how well the feedback formats chosen across conditions matched their personal feedback preferences. Higher rating indicated stronger perceived alignment between the robot’s querying strategy and the participants’ expectations.
6.1.8 Analysis
Survey responses were compiled into cumulative RoSAS (competence, warmth, discomfort) and NASA-TLX workload scores. A repeated-measures ANOVA (rANOVA) tested for significant differences across learning conditions; we report the
6.2 Results
We evaluate hypotheses H1-H3 using both objective and subjective measures. Data from all 30 participants were included in the analysis, as all sessions were completed successfully.
6.2.1 Effectiveness of AFS in mitigating NSEs (H1)
Figure 10a shows the average penalty incurred under each condition. AFS approach incurred the least NSE penalty
Figure 10. Results from the user study on the Kinova 7DoF arm. (a) Average penalty incurred across methods in the human subjects study. (b) NASA-TLX workload across the four conditions.
6.2.2 Learning efficiency and workload (H2)
We first compare the perceived workload across different feedback formats, followed by the results across learning conditions. Demonstration is the most widely used feedback format in existing works but was perceived as the most demanding (Figure 11c). While corrections offer corrective action in addition to disapproving agent’s action, it also imposed substantial effort on the users. Approval required the least workload but conveyed limited information. A repeated-measures ANOVA revealed a significant effect of feedback format on perceived workload,
Figure 11. User study results. (a,b) RoSAS competence and NASA Task-Load across the four conditions in the main study; (c) NASA Task-Load across feedback formats.
The rANOVA analysis across the four learning conditions further revealed a significant effect in the NASA-TLX workload ratings
6.2.3 Trust, competence, and preference alignment (H3)
Participants’ rating on the robot’s ability to act safely increased after learning with AFS, as shown in Figure 11b. A significant effect was also found for perceived robot competence
Descriptive analyses of user alignment on state criticality and feedback alignment ratings, indicated consistent trends across participants. While differences between conditions were not statistically significant
7 Discussion
Our experiments followed an increasingly realistic progression in design. In the experiments in simulation with both avoidable and unavoidable NSEs, AFS incurred lower penalties and overall costs compared to the baselines, demonstrating its ability to balance task performance with safety. The results of our pilot study, where users interacted with a simulated agent, showed that AFS effectively learns the participant’s feedback preference model and uses them to select formats aligned with user expectations. Finally, the in-person user study with the Kinova arm, showed the practicality of using AFS in real-world settings, achieving favorable ratings on trust, workload, and user-preference alignment. These findings support our three hypotheses regarding the performance of AFS: (H1) it reduces unsafe behaviors more effectively than the baselines, (H2) it improves learning efficiency while reducing user workload, and (H3) it is perceived as more trustworthy and competent. The results collectively highlight that adaptively selecting both the query format and the states to pose the queries to the user enhances learning efficiency and reduces user effort.
Beyond confirming these hypotheses, the findings provide important design implications for human-in-the-loop learning systems. By modeling the trade-off between informativeness and effort, AFS offers a framework to balance user workload with the need for high-quality feedback. The learned feedback preference model allows the agent to adaptively select querying formats while minimizing human effort. Using KL-divergence as stopping criterion further enables adaptive termination of the querying process. This overcomes the problem of determining the “right” querying budget for a problem, and shows that AFS enables efficient learning while minimizing redundant human feedback. These design principles can inform the development of interactive systems that adapt query format and frequency based on agent’s current knowledge and user feedback preferences. Overall the results show that AFS (1) consistently outperforms the baselines across different evaluation settings, and (2) can be effectively deployed in real-world human-robot interaction scenarios.
A key strength of this work lies in its extensive evaluation, from simulation to real robot studies, supporting AFS’s robustness and practicality. One limitation, however, is that the current evaluation focuses on discrete environments. Extending AFS to continuous domains introduces challenges such as identifying critical states and estimating divergence-based information gain in high-dimensional spaces. While gathering feedback at the trajectory-level is relatively easier in continuous settings, gathering state-level feedback, which is the focus of this work, is challenging. These challenges stem from the need for scalable state representations and efficient sampling strategies, which will be a focus for future work.
8 Conclusion and future work
The proposed Adaptive Feedback Selection (AFS) facilitates querying a human in different formats in different regions of the state space, to effectively learn a reward function. Our approach uses information gain to identify critical states for querying, and the most informative feedback format to query in these states, while accounting for the cost and uncertainty of receiving feedback in each format. Our empirical evaluations using four domains in simulation and a human subjects study in simulation demonstrate the effectiveness and sample efficiency of our approach in mitigating avoidable and unavoidable negative side effects (NSEs). The subsequent in-person user study with a Kinova Gen3 7DoF arm further validates these finding, showing that AFS not only improves NSE avoidance but also enhances user trust, competence perception, and user-alignment. While AFS assumes that human feedback reflects a true underlying notion of safety, biased feedback can misguide the robot and lead to unintended NSEs. Understanding when such biases arise and how to correct for them remains an open challenge. Extending AFS with bias-aware inference mechanisms is a promising future direction. Future work will also focus on extending AFS to continuous state and action spaces, strengthening AFS’s applicability to complex, safety-critical domains where user-aware interaction is essential.
Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Ethics statement
The studies involving humans were approved by Human Research Protection Program and Institutional Review Board, Oregon State University. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.
Author contributions
YA: Writing – review and editing, Investigation, Data curation, Methodology, Conceptualization, Writing – original draft, Visualization. NN: Writing – review and editing, Writing – original draft, Data curation. KS: Writing – review and editing, Data curation. NF: Resources, Writing – review and editing, Supervision. SS: Supervision, Funding acquisition, Resources, Writing – review and editing.
Funding
The author(s) declared that financial support was received for this work and/or its publication. This work was supported in part by National Science Foundation grant number 2416459.
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/frobt.2025.1734564/full#supplementary-material
Footnotes
1See Section 3.1 in the Supplementary Materials for details on the dialog box and examples for each feedback format
References
Abbeel, P., and Ng, A. Y. (2004). “Apprenticeship learning via inverse reinforcement learning,” in Proceedings of the twenty-first international conference on machin learning, (ICML).
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., and Mané, D. (2016). Concrete problems in AI safety. arXiv Preprint arXiv:1606.06565. doi:10.48550/arXiv.1606.06565
Bärmann, L., Kartmann, R., Peller-Konrad, F., Niehues, J., Waibel, A., and Asfour, T. (2024). Incremental learning of humanoid robot behavior from natural interaction and large language models. Front. Robotics AI 11, 1455375. doi:10.3389/frobt.2024.1455375
Beierling, H., Beierling, R., and Vollmer, A. (2025). The power of combined modalities in interactive robot learning. Front. Robotics AI 12, 1598968. doi:10.3389/frobt.2025.1598968
Bıyık, E., Losey, D. P., Palan, M., Landolfi, N. C., Shevchuk, G., and Sadigh, D. (2022). Learning reward functions from diverse sources of human feedback: optimally integrating demonstrations and preferences. Int. J. Robotics Res. (IJRR) 41, 45–67. doi:10.1177/02783649211041652
Brown, D. S., Cui, Y., and Niekum, S. (2018). “Riskaware active inverse reinforcement learning,” 87. Conference on Robot Learning.
Brown, D., Coleman, R., Srinivasan, R., and Niekum, S. (2020a). “Safe imitation learning via fast Bayesian reward inference from preferences,” in International conference on machine learning (ICML) (PMLR).
Brown, D., Niekum, S., and Petrik, M. (2020b). Bayesian robust optimization for imitation learning. Adv. Neural Inf. Process. Syst. (NeurIPS). doi:10.5555/3495724.3495933
Candon, K., Chen, J., Kim, Y., Hsu, Z., Tsoi, N., and Vázquez, M. (2023). “Nonverbal human signals can help autonomous agents infer human preferences for their behavior,” in Proceedings of the international conference on autonomous agents and multiagent systems.
Carpinella, C. M., Wyman, A. B., Perez, M. A., and Stroessner, S. J. (2017). “The robotic social attributes scale (rosas): development and validation,” in 12th ACM/IEEE international conference on human robot interaction (HRI).
Cui, Y., and Niekum, S. (2018). “Active reward learning from critiques,” in IEEE international conference on robotics and automation (ICRA).
Cui, Y., Koppol, P., Admoni, H., Niekum, S., Simmons, R., Steinfeld, A., et al. (2021a). “Understanding the relationship between interactions and outcomes in humanintheloop machine learning,” in International joint conference on artificial intelligence (IJCAI).
Cui, Y., Zhang, Q., Knox, B., Allievi, A., Stone, P., and Niekum, S. (2021b). “The empathic framework for task learning from implicit human feedback,” in Conference on robot learning (CoRL).
Cui, Y., Karamcheti, S., Palleti, R., Shivakumar, N., Liang, P., and Sadigh, D. (2023). “No, to the right online language corrections for robotic manipulation via shared autonomy,” in Proceedings of ACM/IEEE conference on human robot interaction (HRI).
Feng, X., Jiang, Z., Kaufmann, T., Xu, P., Hüllermeier, E., Weng, P., et al. (2025). “Duo: diverse, uncertain, on-policy query generation and selection for reinforcement learning from human feedback,” in Proceedings of the AAAI conference on artificial intelligence (AAAI).
Ghosal, G. R., Zurek, M., Brown, D. S., and Dragan, A. D. (2023). “The effect of modeling human rationality level on learning rewards from multiple feedback types,” in Proceedings of the AAAI conference on artificial intelligence (AAAI).
Hadfield Menell, D., Milli, S., Abbeel, P., Russell, S. J., and Dragan, A. (2017). Inverse reward design. Adv. Neural Inf. Process. Syst. (NeurIPS). doi:10.5555/3295222.3295421
Hart, S. G., and Staveland, L. E. (1988). Development of nasatlx (task load index): results of empirical and theoretical research. Adv. Psychology. doi:10.1016/j.ecns.2024.101607
Hassan, S., Chung, H.-Y., Tan, X. Z., and Alikhani, M. (2025). “Coherence-driven multimodal safety dialogue with active learning for embodied agents,” in Proceedings of the 24th international conference on autonomous agents and multiagent systems (AAMAS).
Huang, J., Aronson, R. M., and Short, E. S. (2024). “Modeling variation in human feedback with user inputs: an exploratory methodology,” in Proceedings of ACM/IEEE international conference on human robot interaction (HRI).
Ibarz, B., Leike, J., Pohlen, T., Irving, G., Legg, S., and Amodei, D. (2018). Reward learning from human preferences and demonstrations in atari. Adv. Neural Inf. Process. Syst. (NeurIPS). doi:10.5555/3327757.3327897
Jeon, H. J., Milli, S., and Dragan, A. (2020). Reward rational (implicit) choice: a unifying formalism for reward learning. Adv. Neural Inf. Process. Syst. (NeurIPS). doi:10.5555/3495724.3496095
Ji, J., Zhang, B., Zhou, J., Pan, X., Huang, W., Sun, R., et al. (2023). “Safety gymnasium: a unified safe reinforcement learning benchmark,” in Thirty-seventh conference on neural information processing systems datasets and benchmarks track (NeurIPS).
Kim, C., Seo, Y., Liu, H., Lee, L., Shin, J., Lee, H., et al. (2023). “Guide your agent with adaptive multimodal rewards,” in Thirty-seventh conference on neural information processing systems.
Krakovna, V., Orseau, L., Martic, M., and Legg, S. (2018). Measuring and avoiding side effects using relative reachability. arXiv Preprint arXiv:1806.01186.
Krakovna, V., Orseau, L., Ngo, R., Martic, M., and Legg, S. (2020). Avoiding side effects by considering future tasks. Adv. Neural Inf. Process. Syst. (NeurIPS). doi:10.5555/3495724.3497324
Lakkaraju, H., Kamar, E., Caruana, R., and Horvitz, E. (2017). “Identifying unknown unknowns in the open world: representations and policies for guided exploration,” in Proceedings of the AAAI conference on artificial intelligence (AAAI).
Lou, X., Zhang, J., Wang, Z., Huang, K., and Du, Y. (2024). “Safe reinforcement learning with free-form natural language constraints and pre-trained language models,” in The 23rd international conference on Autonomous Agents and Multi-Agent Systems (AAMAS).
Najar, A., and Chetouani, M. (2021). Reinforcement learning with human advice: a survey. Front. Robotics AI 8, 8–2021. doi:10.3389/frobt.2021.584075
Ng, A. Y., and Russell, S. (2000). “Algorithms for inverse reinforcement learning,” in Proceedings of the seventeenth international conference on machine learning (ICML).
Ramachandran, D., and Amir, E. (2007). “Bayesian inverse reinforcement learning,” in Proceedings of the 20th international joint conference on artifical intelligence (IJCAI).
Ramakrishnan, R., Kamar, E., Dey, D., Horvitz, E., and Shah, J. (2020). Blind spot detection for safe sim-to-real transfer. J. Artif. Intell. Res. (JAIR) 67, 191–234. doi:10.1613/jair.1.11436
Ross, S., Gordon, G., and Bagnell, D. (2011). “A reduction of imitation learning and structured prediction to no-regret online learning,” in Proceedings of the fourteenth international conference on artificial intelligence and statistics, (AISTATS).
Saisubramanian, S., and Zilberstein, S. (2021). “Mitigating negative side effects via environment shaping,” in International conference on Autonomous Agents and Multiagent Systems (AAMAS).
Saisubramanian, S., Kamar, E., and Zilberstein, S. (2021a). “A multiobjective approach to mitigate negative side effects,” in Proceedings of the twenty-ninth international joint conference on artificial intelligence (International Joint Conferences on Artificial Intelligence Organization).
Saisubramanian, S., Roberts, S. C., and Zilberstein, S. (2021b). “Understanding user attitudes towards negative side effects of AI systems,” in Extended abstracts of the 2021 conference on human factors in computing systems (CHI).
Saisubramanian, S., Kamar, E., and Zilberstein, S. (2022). Avoiding negative side effects of autonomous systems in the open world. J. Artif. Intell. Res. (JAIR) 74, 143–177. doi:10.1613/jair.1.13581
Saran, A., Zhang, R., Short, E. S., and Niekum, S. (2021). “Efficiently guiding imitation learning agents with human gaze,” in International conference on Autonomous Agents and Multiagent Systems (AAMAS).
Seo, S., and Unhelkar, V. (2024). “Idil: imitation learning of intent-driven expert behavior,” in Proceedings of the 23rd international conference on Autonomous Agents and Multiagent Systems (AAMAS).
Sontakke, S. A., Zhang, J., Arnold, S., Pertsch, K., Biyik, E., Sadigh, D., et al. (2023). “RoboCLIP: one demonstration is enough to learn robot policies,” in Thirty-seventh conference on neural information processing systems (NeurIPS).
Srivastava, A., Saisubramanian, S., Paruchuri, P., Kumar, A., and Zilberstein, S. (2023). “Planning and learning for Non-markovian negative side effects using finite state controllers,” in Proceedings of the AAAI conference on artificial intelligence (AAAI).
Strokina, N., Yang, W., Pajarinen, J., Serbenyuk, N., Kämäräinen, J., and Ghabcheloo, R. (2022). Visual rewards from observation for sequential tasks: autonomous pile loading. Front. Robotics AI 9, 9–2022. doi:10.3389/frobt.2022.838059
Tarakli, I., Vinanzi, S., and Nuovo, A. D. (2024). Interactive reinforcement learning from natural language feedback. IEEE/RSJ International Conference on Intelligent Robots and Systems IROS.
Tien, J., He, J. Z., Erickson, Z., Dragan, A., and Brown, D. S. (2023). “Causal confusion and reward misidentification in preferencebased reward learning,” in The eleventh international conference on learning representations (ICLR).
Xue, W., An, B., Yan, S., and Xu, Z. (2024). “Reinforcement learning from diverse human preferences,” in Proceedings of the thirty-third international joint conference on artificial intelligence, IJCAI (International Joint Conferences on Artificial Intelligence Organization).
Yang, Y., Neary, C., and Topcu, U. (2024a). “Multimodal pretrained models for verifiable sequential decision-making: planning, grounding, and perception,” in Proceedings of the 23rd international conference on autonomous agents and multiagent systems (AAMAS).
Yang, Z., Jun, M., Tien, J., Russell, S., Dragan, A., and Biyik, E. (2024b). “Trajectory improvement and reward learning from comparative language feedback,” in Conference on robot learning (CoRL).
Zha, Y., Guan, L., and Kambhampati, S. (2024). “Learning from ambiguous demonstrations with self-explanation guided reinforcement learning,” in Proceedings of the AAAI conference on artificial intelligence.
Keywords: information gain, interactive imitation learning, learning from human feedback, learning from multiple formats, robot learning
Citation: Anand Y, Nwagwu N, Sabbe K, Fitter NT and Saisubramanian S (2026) Adaptive querying for reward learning from human feedback. Front. Robot. AI 12:1734564. doi: 10.3389/frobt.2025.1734564
Received: 28 October 2025; Accepted: 15 December 2025;
Published: 12 February 2026.
Edited by:
Chao Zeng, University of Liverpool, United KingdomReviewed by:
Pasqualino Sirignano, Sapienza University of Rome, ItalyChuanfei Hu, Southeast University, China
Copyright © 2026 Anand, Nwagwu, Sabbe, Fitter and Saisubramanian. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Sandhya Saisubramanian, c2FuZGh5YS5zYWlAb3JlZ29uc3RhdGUuZWR1