Generation and evaluation of adaptive explanations based on dynamic partner-modeling and non-stationary decision making

Robrecht-Hilbig, Amelie S.; Kowalski, Christoph; Kopp, Stefan

doi:10.3389/fcomp.2026.1558674

TECHNOLOGY AND CODE article

Front. Comput. Sci., 19 February 2026

Sec. Human-Media Interaction

Volume 8 - 2026 | https://doi.org/10.3389/fcomp.2026.1558674

Generation and evaluation of adaptive explanations based on dynamic partner-modeling and non-stationary decision making

1. Social Cognitive Systems, Technical Faculty, Bielefeld University, Bielefeld, Germany
2. TRR 318 Constructing Explainability, Bielefeld, Germany

Article metrics

View details

310

Views

Downloads

Abstract

Adapting to the addressee is crucial for successful explanations, yet poses significant challenges for dialog systems. We adopted the approach of treating explanation generation as a non-stationary decision process, in which the optimal strategy varies with changing beliefs about the explainee and the interaction context. In this study, we addressed the questions of (1) how to track the interaction context and the relevant adaptation parameters in a formally defined computational partner model (PM), and (2) how to utilize this model in the dynamically adjusted, rational decision process that determines the currently best explanation strategy. We proposed a Bayesian inference-based approach to continuously update the PM based on user feedback, and a non-stationary Markov Decision Process to adjust decision-making based on the PM values. We evaluated an implementation of this framework in an online user study, showing the positive effects of a broader PM. The results showed that an adapted explanation leads to a higher level of user understanding, highlighting the potential of our approach to improve current dialog systems.

1 Introduction

People convey ideas, beliefs, objects, or processes to one another when explaining a topic. In literature, an explanation is either seen as a process (an interaction involving at least two agents) or as a product (an answer to a why question) (Lombrozo, 2006). Explanations thereby can address three types of questions: What?, How?, and Why? (Miller, 2019). Most approaches to making AI systems explainable have focused on explanations as single-turn answers to why-questions (Chandra et al., 2024; Lewis, 1986; Anjomshoae et al., 2019). Most research in the field of Explainable Artificial Intelligence (XAI) provides static explanations (Ali et al., 2023) or accepts major limitations in terms of dialog planning and informativeness (Piriyakulkij et al., 2024; Bertrand et al., 2023). More human-inspired approaches argue that explanations must also be considered constructed through a collaborative interaction process involving at least two agents (Rohlfing et al., 2021; Robrecht and Kopp, 2023) with the potential to address each type of explanatory question (cf. Axelsson and Skantze, 2023b; El-Assady et al., 2019). In this view, an agent with advanced knowledge (the EX) tries to tailor the explanation to an agent with less knowledge (the explainee), who in turn provides feedback and thus contributes to the joint co-construction of the explanation. The emerging interaction is influenced by the EX's actions and the EX's feedback, their respective roles, and the domain being discussed. While XAI is mainly focused on explaining a system's behavior, our approach asks how to tailor the explanation to make it understandable to the individual user. We focus on co-constructive explanations in everyday scenarios, looking at explanations as social interactions dynamically shaped by equivalent partners (Rohlfing et al., 2021). The two approaches are not exclusive but rather complementary. Nevertheless, the central goals differ slightly.

Crucially, an adept explainer (EX) forms assumptions (beliefs) about the explainee (EE) to guide how to proceed with the explanation. Furthermore, the EX updates these beliefs during the interaction based on feedback or understanding of the listener's abilities or personality. Conversely, the EX builds a partner model (PM) and uses it to generate adaptive explanations. We now ask how intelligent autonomous agents can be equipped with abilities that enable them to successfully explain a certain domain to a human user. Thereby, we focus on everyday explanation scenarios and address the following questions: Which information about the human EE and the ongoing interaction needs to be captured in a suitable PM? What is a suitable formalization of the PM to capture dynamic updates during an explanatory interaction? And how can we use the PM to steer the EX's decision-making?

Previous research has largely focused on mapping user feedback to a level of understanding (Axelsson and Skantze, 2020) or determining the mental state of a listener (Buschmeier and Kopp, 2018). These approaches do not encompass the full spectrum of adaptive processes observable in human interaction. In addition to tailoring explanations to the interlocutor's comprehension, we also need to consider adaptation parameters that go beyond mere comprehension and can be gleaned from input received during the interaction, especially the user's feedback. Moreover, previous approaches to infer such features (Robrecht and Kopp, 2023) use discrete and categorical values, ignoring the uncertainty in this process and how it can vary (decrease or increase) due to, e.g., unexpected user feedback.

In this study, we present a conceptual framework for an adaptive EX, a formal approach to building and updating a PM, and apply it to adapt the explanation. We base this model on theoretical and empirical findings showing that the user's domain expertise, cognitive load, attentiveness, and cooperativeness are relevant factors in the underlying decisions. We introduce SNAPE-PM, an implemented version of such an agent specialized on generating adaptive explanations for board games. Regardless of whether such a feature is persistent (static) or floating (dynamic), our model employs Bayesian probabilistic inference to dynamically form and update potentially uncertain beliefs about them, based on the listener's feedback. We also address the challenge of designing a decision process that is sufficiently flexible, precise, and fast to select the next best explanatory actions given the current state of the PM. To meet all those requirements, the explanation generation is modeled as a non-stationary decision process shaped by the current PM. This agent is evaluated against a non-adaptive and a knowledge-adaptive agent in an online interaction study, proving the benefits of a broader PM.

In the remainder of the paper, we first discuss related work, then introduce the conceptual framework and its implementation. We describe how explanation generation can be modeled as a combination of dynamic Bayesian reasoning and non-stationary Markovian decision-making. Finally, we report results from an evaluation of SNAPE-PM and discuss their relevance to the field of human-agent interaction.

2 Related work

While Miller (2019) suggests examining human-human explanations to uncover what characterizes a good explanation. Sokol and Flach (2020) show that most current approaches still use a one-fits-all approach when designing explanations in human-agent interaction (HAI).

2.1 Explanations in human–human interaction

It quickly becomes evident that each explanation is a distinct process, varying with the interlocutors' stance or goals and the domain they address (Keil, 2006). Involving the user as an active decision-maker in the modeling process is often called for but has rarely been implemented to date (Çelikok et al., 2023). In particular, everyday explanations show a high degree of variability (Fisher et al., 2023), which is not surprising if we view such an explanation as a process that is dialogically co-constructed by both interlocutors (Rohlfing et al., 2021).

Following Dillenbourg et al. (2016) and Clark and Wilkes-Gibbs (1986), we conjecture that such a dialogue requires the EX to have a suitable PM that captures the partner's understanding and other relevant parameters. The term PM refers to a dynamic abstract model of the interlocutor needed to maintain shared comprehension or grounding in a collaborative task (Dillenbourg et al., 2016). A PM consists of dispositional aspects (A's representation of B's long-term knowledge, skills, or traits) as well as situational features (A's representation of B's current understanding, behavior, or intentions in the collaboration setting). Most approaches focus on modeling user knowledge, although it is known that characteristics, experiences, expectations, and stereotypes significantly influence the global user model and, therefore, the interaction (Brennan et al., 2010).

2.2 Explanations in human-agent interaction

Especially, dispositional aspects are often ignored in human-agent interaction approaches. Although they are rare (Anjomshoae et al., 2019), some approaches center their explanation around the user (Axelsson and Skantze, 2023a; Buschmeier and Kopp, 2018; Idrizi, 2024). Much work in the field of XAI has focused on the automatic generation of single-turn explanations to make the individual decisions of the system comprehensible. Most XAI approaches do not adapt to the user but follow a one-fits-all manner (Noorani, 2025). The small number of approaches that modify their explanation toward the user's mental model only consider the estimated knowledge to detect differences to their own model (Sreedharan et al., 2021; Vasileiou and Yeoh, 2023). Large Language Models (LLMs) represent a significant advancement in the field of dialog systems. When discussing adaptive explanations generated by LLMs, it is essential to note that the explanation itself is adapted to the user; rather, the recommendation is optimized based on the user's profile. Subsequently, this recommendation is explained using an LLM (Lubos et al., 2024). Furthermore, Kunz and Kuhlmann (2024) discussed GPT-4's ability to adapt to specific user groups, but does not clarify how the system identifies or categorizes users into these groups. In their research, MacNeil et al. (2023) introduced an LLM-generated code snippet with different explanations. Students rate those explanations as useful. This shows the LLM's capabilities to generate different explanation types, but there remains little research on how these models determine which explanation type is most suitable for a particular user. More research is needed on the initialization of the PM.

2.3 Methods for explanation generation

Sequential decision-making in artificial agents is commonly modeled using Markov Decision Processes (MDP) (Bakker et al., 2005; Alagoz et al., 2010; Nardi and Stachniss, 2019). In a classical MDP, the environment is fully known and static, whereas in a partially observable MDP (POMDP), the agent does not have full access to the environment's state but only some observations about it (Puterman, 1994). Newman (2024) used a decentralized POMDP solved by inverse Reinforcement Learning to model a human-agent pick-and-place collaboration task, while Zhao et al. (2025) observed the interlocutor's behavior to model her state as a POMDP. The main focus of these approaches is inferring the uncertain human's goals from observed behavior while treating the environment as static. Conversely, the goal (understanding) is known, while the environment is changing. To model such an environment, non-stationary MDPs (NSMDPs) can be used. Most research on NSMDPs has focused on solving them (Luo et al., 2024; Pettet et al., 2024), whereas formulating the decision-making process is task-specific and often treated as a secondary aspect. Other forms of dynamic decision-making have focused on non-stationary policies (Scherrer and Lesner, 2012; Yang, 2024), in which the environment is consistent, and the agent switches between policies for different subproblems.

A lot of recent approaches explore the use of LLMs for decision-making tasks (Hu et al., 2025; Lu et al., 2025); however, these models present two problems in adaptive interactions. On the one hand, LLMs still struggle with reasoning (Nezhurina et al., 2025; Shojaee et al., 2025) and tend to hallucinate (Reddy et al., 2024; Perkovic et al., 2024); on the other hand, they do not reveal structures and patterns due to their black-box nature, and are not faithful when asked to report their reasoning process (Chen et al., 2025).

Other extensive research has focused on the development and application of user models to enable adaptive systems. Although it is generally accepted that adaptation requires a model of the interlocutor (Srinivasan and Chander, 2021), the specific content of this model often remains an open question (Nofshin, 2024) and depends on the interaction type and goal (Dillenbourg et al., 2016). Most approaches focus on modeling the interlocutor's knowledge (Ray et al., 2013; Westra and Nagel, 2021). However, it is recognized that adaptive explanations in HHI extend beyond knowledge adaptation (Lombrozo, 2016; Buhl et al., 2024). Groß et al. (2025) recently introduced the SHIFT framework for adaptive scaffolding in multimodal Human-Robot Interaction (HRI). This framework facilitates mapping the current user state to one of six possible cognitive states represented in the PM. Hostetter and Bahl (2023) presented an intelligent pedagogical agent that enhances user understanding by providing personalized explanations and suggestions, rather than offering generic ones or no assistance at all. In that case, the PM reflected the student's attitude toward learning, but the study did not differentiate between personalized and non-personalized explanations. Singh and Rohlfing (2024) explored the relationship between task requirements and PMs in HRI, while Becková et al. (2025) proposed a multimodal PM that assesses user attention based on facial and bodily cues. Likewise, Zhao et al. (2025) identifies dimensions along which agents can adapt to humans in collaborative scenarios, including goals and intentions, other internal cognitive features, physical factors, and learned human models.

Another group of approaches uses deep (representation) learning for partner-modeling, enhancing both speed and generalizability while avoiding the necessity for a comprehensive understanding of underlying processes. For instance, personalized advertisements were generated based on user preference models (Gao et al., 2023; Hermann, 2022), agents provided emotional support by assessing user moods (Liu et al., 2021), and systems analyzed health data to offer treatment recommendations (van Baalen et al., 2021). Moreover, Lim et al. (2023) adapted agent utterances to align with user knowledge and persona using LLMs. However, the pre-training phase requires extensive data and presents challenges, especially for users who fall outside the trained personas. Notably, the PM presented by Bulathwela et al. (2025) distinguishes itself by going beyond mere user knowledge modeling, incorporating factors such as interests, goals, skills, and engagement into its framework. Still, one problem with using such black-box models for explanation generation is that users often over-trust their explanations (Masters, 2025), which is problematic when black-box systems are explained by black-box systems (Bravo-Rocca, 2025).

3 Conceptual model

An adaptive EX, whether human or artificial, needs to repeatedly ask the following questions during the course of interaction: (1) What does the EE know? and (2) How do I respond to this state? These questions are addressed alternately until the goal of the explanation is achieved and the explanandum (EM) is assumed to be grounded. This dynamics corresponds to an adaptation cycle consisting of two intertwined adaptation processes (see Figure 1): In the cognitive adaptation process, the agent creates and updates a PM to estimate the current state of the EE by monitoring the received FB. In the interactive adaptation process, the agent uses this PM to decide which action brings it closer to its goal. An adaptation cycle in this two-step process corresponds to a turn within an explanation, while a series of explanation cycles form the full explanation. Modeling an interaction as a two-step process consisting of a monitoring and a decision-making component is a common approach in HHI and HAI alike (Chandra et al., 2024; Zhao et al., 2025). We adopt this view here as a basic underlying concept and model an adaptive AI EX by implementing and integrating these two key sub-processes as follows.

Figure 1

3.1 Cognitive adaptation

Cognitive adaptation refers to the implicit or hidden process of partner-modeling within a cognitive EX agent. The different aspects monitored in the PM depend on the goal of the interaction. Dillenbourg et al. (2016) described a PM as a mosaic of fragments; different subsets of fragments and their compositions match different interaction types. As the main interaction goal in an explanation is to increase the user's understanding of the EM, a knowledge model is the most detailed component of the PM. When looking at human-human explanations, a logical structure driven by the semantic structure of the EM can be observed (Fisher et al., 2023). This higher-level structuring behavior is assumed to hierarchically structure the EM. In addition to the estimated knowledge, we introduce four additional abstract parameters to optimize the adaptation process. Those adaptation parameters are the expertise, cognitive load, attentiveness, and cooperativeness of the EE. Although these four aspects are not necessary for generating an explanation per se, based on the insights of Brennan et al. (2010), we hypothesize that explanation quality improves when they support the adaptive decision-making process (as described in detail below).

The expertise of the EE refers to her familiarity with the domain of the EM. For instance, a person who enjoys reading fantasy novels can be considered a fantasy expert, even if she is not familiar with the specifics of Harry Potter. While knowledge develops during the interaction, the EE's expertise remains constant. Expertise is an important parameter as it influences the preferred depth of information and the choice of vocabulary used by the EX. Research by Zhao et al. (2025) indicates that differing levels of expertise lead to varying preferences for instructions, and that a mismatch between expertise and the given instructions leads to confusion and frustration in interactions.

Closely related to the EE's expertise is the cognitive load. This parameter refers to the limited resources in a person's working memory for a specific task (Chandler and Sweller, 1991). It is relevant for the success of an adaptive interaction to consider the cognitive load (Çelikok et al., 2023). Research indicates that linguistic complexity can significantly impact cognitive load, suggesting that explanations should be tailored to an individual's current load to avoid overwhelming or boring them (Engonopoulos et al., 2013). Cognitive load is affected by level of expertise: a person who is unfamiliar with the general patterns and structures in the EM's domain struggles with understanding more. However, cognitive load is not solely the opposite of expertise; it can also be influenced by various other factors. Even gaming experts might experience high cognitive load when facing distractions, being tired, or struggling with verbal explanations while learning about a board game. Thus, striking the right balance of linguistic complexity for the user is essential for effective communication in an explanation.

The last two adaptation parameters address the communicative functions of FB. According to Allwood et al. (1992), the communicative function of FB can be illustrated as a ladder with four rungs: contact, perception, understanding, and attitudinal reactions. Contact pertains to whether the listener is present and engaged in the interaction. For example, if someone is turned away from the speaker in a noisy environment, they may show negative contact by providing little to no feedback or becoming distracted. Perception involves whether the listener is actually absorbing the spoken words, regardless of comprehension. Positive perception can be indicated by the listener's eye contact or backchannel responses. Substantive statements such as I (don't) understand xy. address the listener's (lack of) understanding. Furthermore, attitude usually consists of statements that express an approving or disapproving mood or (non-)acceptance. All of these functions are considered important when adapting explanations and consequently the underlying PM. The relevance of distinguishing between the different types of failures in HAI is emphasized in a survey by Honig and Oron-Gilad (2018).

However, for explanation, understanding is the main goal of this type of interaction, which also affects the communicative functions of FB. When the EE is assumed to be an informative interlocutor that is aware of the interaction goal, FB is, in many cases, more informative at the level of understanding than at the level of attitude. While in other types of interactions the expression of positive understanding implicitly shows a negative attitude (Benotti and Blackburn, 2021), this is not the case in explanations. Attitude is rarely expressed in explanations and is especially unexpected for rather objective board game explanations. An EE that expresses positive understanding (“Ok, so you choose the figure for me.”) does not necessarily express a negative attitude toward this.

The adaptation parameter of attentiveness captures the lower levels of feedback, namely contact and perception. Monitoring the attentiveness of the interlocutor during an explanation is crucial, as it provides the agent with insights into the potential success of the information conveyed. The type of potential repair moves is strongly related to the current level of attentiveness. A user who currently shows low attentiveness benefits from repetition, while an attentive listener needs a higher-level repair and is therefore more likely to benefit from rephrasing or comparison.

Conversely, cooperativeness refers to the EE's willingness to actively engage in the explanation. Enhanced cooperativeness implies a more co-constructed explanation. For feedback to reflect cooperativeness, it must be substantive, meaning the user actively participates by taking the conversational turn (Chi et al., 2008). The cooperativeness of the user is significant for a rational EX, as it allows for fewer activating moves since the EE is already taking an active role in the co-construction of knowledge, and fewer deepening moves, as the EE tends to express their uncertainties autonomously. The adaptation parameters of attentiveness and cooperativeness are strongly intertwined, which can be described by the concepts of downward evidence and upward completion.

The concept of downward evidence applied to the adaptation parameters of attentiveness and cooperativeness means that a high level of cooperativeness implies a high level of attentiveness. The concept of upward completion also directly influences the two adaptation parameters. If the EX sees that the EE is not reacting to what she says and is constantly looking elsewhere, this behavior can be understood as negative feedback on the perception level. This automatically implies that the information introduced has not been uptaken by the EE. Therefore, they are not understood, and no attitude can be expressed. If the attentiveness of an EE is estimated to be low, this also lowers the expected level of cooperativeness.

3.2 Interactive adaptation

Interactive adaptation, on the other hand, describes the overtly perceivable component of the adaptation cycle. An explanation is interactively adaptive if the EX and the EE both scaffold each other's understanding and the construction of a joint understanding of the explanandum by adapting their communicative behavior. We hypothesize that this interactive adaptivity is based on the use of PM in selecting appropriate next communicative moves. For example, the same question can yield different answers if the EX has different PMs of the EE. We thereby assume that the decision-making process consists of a higher-level discourse plan that is pre-planned, but can be adapted if required. In this plan, the EM is grouped into semantically connected explanation blocks, which follow a partial logical order. These explanation blocks can be gleaned from observations of human-human explanations (Fisher et al., 2023). At a lower level, a decision-making component selects the next information and explanation move by deciding what to say and how to say it. What to say depends mainly on the user's estimated knowledge, while how information is verbalized depends on the adaptation parameters. For example, an EE with a high cognitive load and a low expertise might benefit from a simple reformulation that lowers the information density, whereas this would bore an expert who would prefer a comparison to a familiar game.

This conceptual model of a rational EX that chooses its next utterance in such a way that, given the current PM, the EE's understanding of the EM is maximized,¹ rests on two main assumptions. On the one hand, it assumes that an explanation is better if it is dynamically adapted; on the other hand, it implies that a PM that monitors knowledge and adaptation parameters leads to a better explanation. The first assumption has been confirmed in previous work using an online study that compared a dynamically adaptive agent to a statically adaptive condition and a non-adaptive baseline. The study results showed that both adaptive conditions positively affect the general understanding, whereas only the dynamically adaptive condition leads to significantly higher deep understanding (Robrecht et al., 2023). This model, however, is based on a simplified PM with a less detailed representation of the knowledge state and two unrelated adaptation parameters, thus using a rather simple MDP. The present study significantly extends this work to inferring latent partner state variables and grounding the non-stationary decision-making in augmented and dynamically inferred PM structures (see next section). This allows us to examine whether the adaptive explanations actually benefit from such a dynamic partner-modeling, including the additional adaptation parameters.

4 The architecture

In this section, we introduce SNAPE-PM (Figure 2, an implementation of the previously described adaptive EX). We first outline the overall structure of the model, to then describe in detail its two main processes: (1) the creation and update of the PM, and (2) the decision process that utilizes the PM for adaptive explanation generation.

Figure 2

SNAPE-PM is designed in a modular manner, where each process is carried out by different components. A Bayesian Network is used to analyze the users' feedback and map out the PM, while an MDP is solved via Monte Carlo Tree Search (MCTS) to determine the best next explanation move. The sensitivity to changes over time, which is fundamental to an adaptive explanation, requires the Bayesian Network to be dynamic and the MDP to be non-stationary, allowing the components to dynamically update one another. More specifically, as soon as the PM is updated based on feedback from the EE, a new MDP is constructed to formalize the decision problem arising in the newly estimated explanation situation (with, e.g., new information needs or likely new effects of explanation moves).

This approach builds on the Sequential Non-stationary decision process model for AdaPtive Explanation (SNAPE) proposed by Robrecht and Kopp (2023). We extend the SNAPE agent by adding a comprehensive PM tailored for explanation generation (hence, SNAPE-PM), an inference mechanism that accounts for feature-wise interaction within the PM, a reformulation of the non-stationary MDP that rests on the PM, additional explanation moves, and a more complex reward function. The general architecture of SNAPE-PM is shown in Figure 1.

In an interaction with the agent, each turn starts with a potential user utterance, which can be none, a backchannel, or substantive feedback in the form of an open question. While FB is not required, its absence directly influences the PM. Positive or negative backchannels [1a] are directly passed to the core component of the agent, while substantive FB undergoes preprocessing through the NLU component using a fine-tuned Gemma3 model (Gemma Team, Google DeepMind, 2025).

The NLU output serves as input for the model update [1b], which acts as a distribution component. The FB polarity, along with the corresponding information, is forwarded to the graph database [2], which updates the estimated level of understanding (LOU) for the information and passes a list of possible upcoming information and their LOUs back [3]. The observables obtained from user FB are forwarded to the Bayesian belief update component [4], where they contribute to the revision of adaptation parameters. The estimated values of the four adaptation parameters are passed back to the model update [5]. The type of question, estimated adaptation parameters, and possible information are passed to the decision-making component [6], which selects which action and which move to perform next. Finally, action and move are relayed to the NLG [7] component, which verbalizes them into a coherent utterance and displays it to the user. One of the main advantages of SNAPE-PM's modular structure is the generalizability of the system. Except for the graph database, none of the modules is domain-specific; however, they are specific to the interaction type of explanations. The complete model has been implemented in an adaptive EX agent for the Quarto board game. The clear separation of functionality, combined with an abstract class formulation of the ontology management, PM, and model update, enables easy integration of a different ontology management system, PM, or decision process if desired. The full code is publicly available via GitHub.²

4.1 Dynamic partner-modeling

This section introduces the technical realization of the cognitive adaptation by describing the update dynamics of both PM components: the graph database and the Bayesian belief update.

As the knowledge state is directly connected to the goal of explaining, it is represented in detail. To this end, all entities and their relations forming the EM are represented in a Neo4J³ knowledge graph (KG). This KG is labeled with static and dynamic parameters. The static parameters, such as belonging to an explanation block, information complexity, and preconditions required to understand the information, are used to structure the explanation. Furthermore, the EE's level of understanding (LOU) of each knowledge structure is represented in the KG and dynamically estimated to track the grounding state of the respective information. This representation of the knowledge state is used by the non-stationary decision-making component to decide WHAT to say.

To model the adaptation parameters introduced in Section 3, a factored representation describing the PM's current state is needed. SNAPE-PM's PM is realized as a DBN. A Bayesian Network (BN) is a graphical model representing a joint probability distribution in a factored form. DBNs are a specific kind of BNs, designed to model changes over time, assuming a stationary underlying process with the previous state as a prior (Murphy and Russell, 2002). When building the DBN for representing a PM in explanation, we take four factors into account: expertise, cognitive load, attentiveness, and cooperativeness. Each feature depends on its prior state, may be influenced by related features, and can be tracked by observing user feedback during interaction. We will now explain how the agent determines the individual adaptation parameters and what influence them.

The four adaptation parameters are modeled using a DBN capturing their dependencies and interrelations (see Figure 3). Regarding expertise, we build on data from interaction studies conducted with the adaptive SNAPE model (Robrecht et al., 2023), which finds a correlation between self-estimated level of expertise and the amount of positive feedback provided (Figure 4A). No correlation between negative feedback and expertise is found (Figure 4B). Observable feedback is currently represented as binary variables with value (yes) or (no). The formula for estimating the level of expertise from the given feedback is given in Equation 1.

Figure 3

Figure 4

Cognitive load (Equation 2) can be inferred from specific linguistic features in verbal interaction, such as word count (Khawaja et al., 2014) (Equation 21), Type-Token Ratio (Arvan et al., 2023) (Equation 22), or Gunning Fog Index (Gunning, 1968; Khawaja et al., 2014) (Equation 23). An example of how those measures are calculated for an utterance taken from the ADEX corpus (Fisher et al., 2024) can be found in the Appendix. However, those measures are established for spoken interaction, whereas we restrict our current implementation of SNAPE-PM to typed user input, in which those measures are less indicative. When typing, we would rather delete than repeat; instead of stumbling, we pause. Research finds a relationship between a person's cognitive load and mouse movement or keystroke (Grimes and Valacich, 2015; Brizan et al., 2015). We thus measure typing speed and erasure behavior relative to the person's mean to infer the EE's cognitive load. Specifically, we consider the time needed to type a sign (t) and the number of deleted signs (d) in the current feedback input and compare it to the overall mean of the current user (x_user) (Equation 3). If the current value is above the mean, the typing and erasing observable tae increases; if it is below the current mean, it decreases (Equation 2). As shown in (Khawaja et al., 2014) and discussed earlier, the calculation of load L can not only be deduced from the typing behavior tae, but is also related to the expertise.

Attentiveness can be assessed in manifold ways; one established way is to consider the FB behavior. The same listener has been shown to produce less feedback in identical tasks if distracted (Buschmeier et al., 2011). Therefore, the frequency of feedback can be taken as a marker for the user's current level of attentiveness (Equation 4). Like cognitive load, the user's attentiveness changes dynamically as the explanation evolves. A low level of attentiveness increases the probability of fully missing an utterance when using this feature in the decision process (Section 4.2).

The dynamic parameter of cooperativeness can be measured implicitly through the amount of substantive feedback s provided (Equation 5). In turn, a higher level of cooperativeness leads to a higher expected understanding when no feedback is provided (Section 4.2).

Each of the four features is modeled as a latent variable node in the DBN and can have three values: low, medium, and high. The observables are binary variables (values yes and no), except for tae which has three values (lower, higher, None). All observables are directly gleaned from user feedback, which can be either positive or negative backchannels or substantive feedback. Backchannels are provided by clicking the matching smiley or by using the open text option to provide substantive feedback. As soon as the user clicks into the text box, the explanation pauses—a simplified, fully user-centered version of turn-taking. The presence or absence of feedback is taken as evidence, and the DBN begins updating the agent's beliefs. It is implemented based on (Ducamp et al., 2020). The agent initializes a DBN over 20 time steps. When reaching the end, a new DBN is initialized using the last 10 time steps of the previous net as its carryover. Restricting the length helps minimize computational effort, and this size of the DBN is sufficient because only the latest values are considered priors in a DBN. The DBN values are used by the non-stationary decision-making component to decide HOW to say it.

4.2 Non-stationary decision-making

As the PM is updated with each iteration, the decision-making component that uses it must also be dynamic. We thus model the decision process as a non-stationary MDP. To achieve real-time capability—a key requirement for a language-based interactive system—we break down the decision problem into semantically related explanation blocks that can be solved in real-time using MCTS. This approach reduces the search space and is based on insights into how humans structure their explanations (Fisher et al., 2023). The decision model decides which action (what to say) and move (how to say it) to perform next. Three different actions can be selected, each of which is realized by two to four different moves (Figure 5), which differ at the rhetorical level, but address the same information and are based on typical explanation moves found in human-human explanations (Fisher et al., 2024). Modeling the decision process of an adaptive explanation as a non-stationary MDP has been proposed before (Robrecht and Kopp, 2023). In contrast to the previous model, SNAPE-PM introduces some fundamental updates: The MDP (1) utilizes estimated values for the adaptive parameters extracted from the DBN's probability distribution, (2) considers a bigger action space, and (3) uses preconditions and graph distances in the reward functions.

Figure 5

A state in the MDP is defined as:

grounded information in current block b: G_b
total information in current block b: T_b
active question: q∈{None, polar, open}
adaptation parameters: E, L, A, C
set of information that is currently under discussion: CUD = {i₁, ..., i_x} and their LOU_i, their preconditions p₁ and their complexity values cx₁

Reward r, transition probabilities t, and level of understanding LOU are calculated depending on the current state of the DBN. The reward of each action depends on the predicted state of comprehension. It increases with the user's cognitive load and expertise, as we assume an expert with low cognitive load understands faster. The transition probability of staying in the previous state of the MDP, given no feedback, is higher when attentiveness or cooperativeness is estimated to be low, as the user is considered more likely to be distracted. That is, generally, the user's knowledge and the explanation history affect which action is selected. The subsequent selection of explanation moves also depends on the PM features introduced before, as different moves have different success rates for different partners. In what follows, we provide a detailed description of each action and explain how reward, level of understanding, and transition probability are defined for each action.

When providing information, a new aspect of the game is introduced. The agent can provide information in two different ways: Information can be introduced using a declarative statement (P-D) (Your opponents picks the next figure for you.) or generating a comparison (P-C) (In contrast to chess, you do not decide which figure to use next, your opponent does.). Both options have benefits and drawbacks for different partners depending on their expertise. An expert is likely to benefit from comparisons to other board games, whereas a layperson is more likely to be confused by comparisons to unfamiliar games, so simple declarative statements are preferred. For this reason, the initial LOU for a declarative increase linearly with growing expertise (Equation 6),⁴ while the increase is exponential when using a comparison (Equation 7). As a result, an expert gains a higher initial LOU, but at a certain level of expertise, the preferred method of providing information shifts from declarative to comparative.

In the context of this action, the transition probability (Equation 8)⁵ for both moves is equivalent, as an information has been said and cannot be introduced again.

While LOU and t need to be calculated at the move level, the reward considers the current state of knowledge and is calculated at the action level. The reward for providing information (Equation 9) is defined as the sum of the average LOU of each precondition p of the current information i minus one, as long as a non-introduced information exists; otherwise, the reward equals a highly negative value β. Preconditions are EM internal information that help the EE to understand the current information that has been introduced in Robrecht et al. (2023).

If an information is introduced but remains ungrounded, due to no or negative user feedback, the agent has multiple options to deepen information: Paraphrasing what has been said before (D-P; It is not you who picks the piece for you, it is your opponent.), giving additional helpful but not mandatory information (D–A; The figures are made of wood.), generating an example (D–E; For example, you could pick the big dark figure and pass it to your opponent.), or drawing a comparison (D–P; In contrast to TicTacToe your opponent selects the figure for you.). Only triples with a comparable triple in one of the comparable knowledge graphs (chess, Bestof4, TicTacToe) are potential candidates for this move. Triples are comparable, if they are equal (chess: Game-is-Board game; Quarto: Game-is-Board game) or differ in one of the entities (chess: game-lasts- 30 mins; Quarto: game-lasts- 10mins). The availability of an example move is annotated in the KG. When choosing which information to elaborate on, the information semantically closest to the last utterance i_t−1 is preferred. Including the graph distance d here prevents the agent from jumping between information. Preconditions are not considered when deepening information, as the information has already been introduced, and graph distance takes precedence in the corresponding reward function (Equation 10).

The increase in the estimated LOU depends on the estimated value of the user's expertise E(E) and the complexity of the information cx_i. The move complexity is based on the failure frequency in human dialogs. The moves repeat, and examples are well suited for lay EEs, as they do not require prior domain knowledge (Equation 11). The opposite holds for the moves that provide additional information and comparison, which are suitable for expert EEs (Equation 12).

The moves example and comparison require the EE to be highly attentive (Equation 14), while repeat and additional info are easier to perceive (Equation 13). Hence, the MPD transition probability is directly influenced by the user's attentiveness.

In the current implementation of SNAPE-PM, the user can generate substantive feedback by typing a question into a text box. If such feedback occurs, it is handled by the answering question action. The previous state's CUD is replaced by the requested information, and q is set to the corresponding question type (polar or open). To reset the question value to None, the question needs to be answered using the correct type of answer. The move selection, when answering a question, is influenced by the estimated cognitive load 𝔼(L), while the reward is influenced by the question value q of the current state (Equation 15). If the user asked a question before, the reward is zero; otherwise, it is equal to β as it is a strong distortion of the dialog flow and causes confusion when answering a question that was not stated before.

The content and type of a user question define the content and type of the answer. A wh-question such as Where do I place the figure? or a tag question such as What is the name again? requires a declarative answer (A-D; You place it on the board./ The game is named Quarto.), while a polar question (So it is a game for two?), can either be answered with a polar answer (A-A; Yes.) or by paraphrasing the interlocutor (A-P; Indeed, a game for two.). We include two moves polar answer and paraphrase partner that can be used to answer the same type of question whilst being sensitive to the user's cognitive load E(L), which is expressed in the LOU (Equations 16, 17). The move declarative addresses a different type of question (Equation 18). Note that the transition probability is always 1 as long as the requested information is addressed in the answer, as we consider an EE who just asked a question to be attentive (Equation 19). This holds for all three moves.

SNAPE-PM can introduce multiple information at once, depending on the triple complexity cx∈[1, 3] and the current user capacity v (Equation 20), which is dependent on the cognitive load and the value κ⁶ Based on the current capacity, SNAPE-PM needs to balance the difficulty of the next utterance. v should equal the combined complexity of the set of selected information cx(i). For example, if the estimated cognitive load 𝔼(C) is 0.6, the capacity is two. The agent compares all information combinations that (1) are mandatory but not provided so far, (2) have a graph distance of one to ensure semantic connection, and (3) have a combined complexity equal to or closest to v. That is, a combination of either two information semantically connected units with the complexity of one (e.g., Quarto-has-strategy, strategy-is-complex) or a single utterance with complexity two (passive-is-strategy) is suitable.

As real-time processing is key for the system, we reduce the size of the MDP to include only a preselection of conversationally valid moves. That is, the MDP does not consider answer moves if no question was asked and does not look at provide moves if the appropriate triple was already provided. This restriction of the MDP enables real-time processing while still retaining choice over valid moves. Likewise, the MDP does not first choose the action and then the move, but chooses from among all action-move combinations that are conversationally valid. To solve the decision process, mcts⁷ is adapted to output the three best results. If all or two of those are of the same action type and have a distance of one or less, they are presented to the user in a unified utterance. Note that the reward of a terminal state is equal to the cumulated reward of all actions, which in turn depend on the current PM as well as external factors such as the CUD or LOU. A video of an exemplary interaction with the agent and an optional visualization component can be found online⁸.

5 Evaluation

We conducted an evaluation study to analyze whether the presented adaptive EX achieves understanding with human EE's, and how its dynamic partner modeling and non-stationary decision-making contribute to this. The study has been preregistered⁹ and received ethical approval from the University Ethics Review Board.

5.1 Study design

The EX agent explains the board game Quarto! to 199 participants in an online study.¹⁰ Participants interact via the graphical interface shown in Figure 6 and are equally distributed across three conditions in a between-subjects design, resulting in an overall statistical power of 0.888. The study consists of three phases: In the pre-test phase, sociodemographic information is collected; next, the participants interact with one version of the EX agent; and finally, they complete a questionnaire about their perception of the explanation and the agent as well as their objective understanding of Quarto!. The agent's explanation behavior depends on the assigned condition: Participants in the baseline condition can only passively perceive the explanation and cannot give FB. In this condition, each piece of information is introduced once using a declarative statement. In the knowledge-adaptive condition, participants can use backchannels and substantive FB, based on which only the agent's graph database is updated while keeping the other adaptation parameters fixed to preset values (E = 0.75, L = 0.5, A = 0.5, C = 0.75). The fully-adaptive condition utilizes the complete SNAPE-PM architecture, allowing real-time adjustments to the full PM based on user FB. Both adaptive conditions use all action-move combinations.

Figure 6

In all conditions, participants interact via the same text interface, in which system utterances are displayed in a speech bubble. The display time ranges from five to eight seconds, depending on the word count. In both adaptive conditions, users can provide positive or negative backchannel FB by clicking the corresponding smiley icon, which changes the speech bubble color to red or green. Substantive FB is typed into a textbox, during which the explanation is paused. Reminders are provided if no FB has been perceived within the last five turns.

Dependent variables include the user's perception of the explanation (EXQ) and the understanding metrics (UQ). The EXQ questionnaire combined 13 adapted items from various existing questionnaires to assess perceived satisfaction, consistency, structure, human-likeness, intentionality, relevance, comprehensiveness, subjective understanding, truth, adaptivity, control, and the quality of NLU and NLG. The UQ is a pre-evaluated questionnaire consisting of 32 statements¹¹. The questions are either binary-knowledge questions or in-game scenarios that require user action selection. These in-game questions are used to evaluate general and deep understanding of the game. All items, the adapted statements, and their origin for both questionnaires can be found in the Supplementary material.

A pre-study pilot indicated that some participants exhibited unrealistic language skills and likely used LLMs to complete online studies. To detect the use of LLMs and exclude corresponding participants from the dataset, two additional attention checks are included in the questionnaires. The first targets the unimodality of most current LLMs by asking Which color is the second word written in?, , , . The second tested reasoning and language skills by asking the participant to explain the answer given to question Q9 in the UQ. These two additional attention checks are used in addition to nine Instructional Manipulation Checks (IMC) (Oppenheimer et al., 2009). Furthermore, participants had to actively agree not to use AI before the start of the study. The pool of participants was limited to the DACH region.

5.2 Results

All results are analyzed using parametric tests, which are considered robust to small violations of normal distribution (Schmider et al., 2010), or nonparametric tests in cases of heteroscedasticity or a strong violation of normal distribution.

An explanation in the adaptive condition requires nearly twice as many turns as the baseline (fully: m = 146.182, std = 49.234; knowledge: m = 147.394, std = 37.309; baseline: m = 80, std = 0), which is expected since the baseline condition does not include any deepening moves or FB. The two adaptive conditions do not differ significantly, but the fully-adaptive condition shows a bigger variance in the number of turns needed. When comparing the aggregation of multiple information units (triples) into one utterance (Figure 7), the agent in the fully-adaptive condition shows a significantly higher tendency to do so (6.47% of turns) than in the knowledge-adaptive (5.68% of the turns; t = 2.189, p = 0.0304).

Figure 7

Regarding the EEs' understanding, the results are consistent across both general and deep understanding (Figure 8). The understanding scores are normally distributed for the fully-adaptive (general: s = 3.624, p = 0.163; deep: s = 3.526, p = 0.189) and the knowledge-adaptive condition (general: s = 5.94, p = 0.051; deep: s = 3.332, p = 0.189) conditions, with a small violation for the baseline condition (general: s = 6.955, p = 0.031; deep: s = 10.19, p = 0.006), variances are homoscedastic (general: s = 3.492, p = 0.175; deep: s = 3.139;p = 0.208). An ANOVA comparing general (F = 3.679, p = 0.027) and deep understanding (F = 3.457, p = 0.033) across all three conditions shows significant differences. Connected post-hoct-tests with Bonferroni correction indicate that the general and deep understanding scores are significantly higher for the fully-adaptive condition than for the baseline (general: t = 2.636, p = 0.028; deep: t = 2.548, p = 0.036), while no significant differences between the knowledge-adaptive condition and the baseline can be reported (general: t = 1.862, p = 0.194; deep: t = 1.768, p = 0.238).

Figure 8

Regarding the EXQ results, the data are (nearly) normally distributed (fully: s = 8.915, p = 0.012; knowledge: s = 3.518, p = 0.172; baseline: s = 2.348, p = 0.309) and variances are homoscedastic (s = 1.17, p = 0.557). While the overall comparison is not significant (F = 0.417, p = 0.66), some of the items differ significantly (Figure 9). In both adaptive conditions, the NLU is rated better than in the baseline condition (fully vs. baseline: t = 8.192, p = 3.0e⁻¹²; knowledge vs. baseline: t = 9.445, p = 6.3e⁻¹⁵). Participants likewise report a higher adaptivity (A) in the fully-adaptive and knowledge-adaptive condition compared to the baseline (fully vs. baseline: t = 3.05, p = 0.003; knowledge vs. baseline: t = 3.920, p = 0.1e−3). Also, the perceived control (C) over the explanation process receives significantly higher scores in both adaptive conditions (fully vs. baseline: t = 5.650, p = 1.1e⁻⁰⁷; knowledge vs. baseline: t = 6.197, p = 7.9e⁻⁰⁹). In contrast, the baseline is perceived as significantly more consistent (CY) than both adaptive conditions (fully vs. baseline: t = 3.572, p = 0.5e⁻⁰³; knowledge vs. baseline: t = 3.495, p = 0.6e⁻⁰³). Furthermore, the fully-adaptive condition is perceived as less relevant (R) (t = −2.192, p = 0.03) and less structured (S) than the baseline (t = −2.014, p = 0.046). During manual review of the interaction data for the adaptive condition, several examples of surprising behavior patterns by the agent can be found. The NLU component correctly recognizes both the question type and the queried triple, but the agent nevertheless decides to explain different information first. This only happens if none or only a little of the preconditions for the requested information are grounded, and the information is highly complex. No significant differences are observed in perceived truth (T), subjective understanding (SU), satisfaction (SF), intentionality (I), human-likeness (HL), comprehensiveness (CP), or quality of NLG.

Figure 9

5.3 Discussion

The results indicate that the proposed artificial EX agent can produce adaptive explanations in HAI and that these explanations lead to a higher understanding of the explanandum among the human EEs. Explanations generated by the adaptive system are, on average, longer and exhibit greater variance, especially in the fully-adaptive condition. This is inherent in adaptive interaction and can be interpreted as a sign of successful adaptation to the user. A successful adaptive explanation, then, finds the optimal explanation length for the current user. Indeed, both adaptive agents tend to be conservative when estimating the user's LOU, leading to over- or under-specification. Besides, the fully-adaptive agent tends to aggregate information into a single utterance more often, showing a better ability to balance information load, utterance length, and the number of utterances in explanations.

These features of interactive adaptivity, however, must always be evaluated in the context of the user's understanding. The instrument used to measure this understanding made good use of the entire range; none of the participants achieved the minimum or maximum score in general understanding, and only a few in deep understanding. The evaluation of the achieved understanding shows that only the explanation of the fully-adaptive agent leads to a significantly higher understanding than the explanation of the baseline condition. The knowledge-adaptive agent lies between the other two conditions in terms of both general and deep understanding and does not differ significantly from either the baseline or the fully-adaptive explanation. These results show that using a PM that dynamically captures both the user's knowledge state and more abstract latent states, combining this information at the right moments, and employing these features reasonably in adaptive decision-making enables the agent to shorten the explanation while still positively affecting the user's understanding. That is, the agent can explain more effectively and efficiently than, e.g., an agent that only adapts to the user's monitored knowledge.

The results are less clear regarding human users' subjective perception of the explanation. Overall, adaptive explanations are perceived as more controllable and adaptive. However, the non-adaptive baseline explanation is perceived to be more consistent and structured. The two adaptive explanations do not show significant differences. These results are consistent with the nature of an adaptive explanation: the more an explanation is co-constructed by both participants, the greater the EE's active participation, which gives them more control. At the same time, the interaction is more flexible and departs from pre-planned structures, which can make them appear less structured and consistent. An alternative possible reason for the explanation being perceived as less structured may be the weighting of individual adaptation parameters or the relevance of preconditions in the decision-making component. Future ablation studies will reveal the reason for the results or whether the results are a combination of both explanations.

Another aspect that might have influenced the perceived consistency of the explanation is the previously described unaddressed questions. From a rational perspective, given the current state, those decisions might have been correct, as the expected increase in understanding might have been too small. Nevertheless, this decision is not comprehensible to the user.

In summary, although users do not perceive the explanations from the two agents as different, only the fully adaptive condition achieves a significantly higher level of user understanding than the baseline.

6 Conclusion

This study presented a conceptual model of an explaining AI agent that combined cognitive and interactive adaptation processes to co-construct explanations with a human user. This artificial EX is implemented in the modular SNAPE-PM architecture, which combines rational yet non-stationary decision-making with dynamic partner-modeling. In the results, the EX agent can engage in a closed-loop, dynamic adaptation cycle in which it adapts its explanations to changes in the PM, which, in turn, are affected by how the EE reacts to these explanations. Results from the user study show that only SNAPE-PM achieves significantly higher user understanding than the baseline, while the knowledge-adaptive condition does not. This shows that the utilization of latent user states as adaptation parameters can foster better adaptation with a positive effect on the user's understanding.

A perspective use case for SNAPE-PM is the usage as a tool for testing the relevance of different adaptation parameters in ablation studies. In comparison to human-human studies, where it is difficult to ignore a specific adaptation parameter (e.g., deliberately ignoring the partner's attentiveness), this can be easily done using SNAPE-PM. While differences between HHI and HAI must always be considered, the introduced system allows testing a specific hypothesis quickly and easily, as the full code is available online.

The current agent has a couple of limitations. It currently lacks a comparison to a fully LLM-driven agent, which will be one of the future work milestones. Nevertheless, although the purposes of the two agents differ, comparing the effects of explanation and perception between the adaptive EX and an LLM-based agent is relevant to current discourse in the field.

Based on the interaction analysis and the mixed results in the agent perception, one of the next steps will be to enable the agent to use meta-communication to solve potentially confusing situations, such as not immediately answering a question. To integrate this action, the agent needs to be able to (1) detect decisions that come with a potentially high surprisal and (2) be able to give understandable and compact explanations for the surprising decision it just made.

Finally, the model proposed here only scratches the surface of how EXs and EEs collaborate to co-construct an explanation. For example, it is common for EEees to take a more active role in the dialog by giving explicit feedback such as I often play board games., which should increase the estimated level of expertise, or Slow down, I think I lost you., a strong indicator of high cognitive load. The other way around, the EX should also be able to directly query the EE about understanding, expertise, attentiveness, cooperativeness, or cognitive load. Those complex yet natural forms of dialogical interaction are important means of propelling an explanation forward, but they also pose important challenges for the fields of XAI and HAI alike, which cannot be fully solved by LLMs at the current state (Fichtel et al., 2025). The approach presented here to intertwine formal representational, inference, and decision-making models specifically for adaptive explanation generation may provide a principled basis for tackling those larger challenges and thus pave the way for making autonomous agents better EXs in a more explainable way.

Statements

Data availability statement

Datasets are available on request: The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by Ethikkommission der Universität Bielefeld/Ethical Comittee of Bielefeld University. The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required from the participants or the participants' legal guardians/next of kin in accordance with the national legislation and institutional requirements.

Author contributions

AR: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. CK: Conceptualization, Software, Writing – review & editing. SK: Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Writing – review & editing.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This research was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation): TRR 318/1 2021 – 438445824.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fcomp.2026.1558674/full#supplementary-material

Footnotes

1.^This definition is based on the definition of a rational agent to choose an action from a set of possible options that best align with its goals, given its current understanding of the world's state (Russell, 1999; Scontras et al., 2016).

2.^Repository: severus-study, branch: FiC, https://github.com/arobrecht/severus-study/tree/FiC.

3.^https://neo4j.com/

4.^α is a value between 0 and 1, which is currently set to 0.5.

5.^The transition probability to not reach s′, but stay in the previous state s is always 1−t(s′|s, move).

6.^κ is currently set to 5 and can be increased for a more compact explanation.

7.^Using the Python mcts library https://pypi.org/project/mcts/.

8.^OSF Project TRR 318-A01 Adaptive Explanation Generation Files section https://osf.io/daqv9/files.

9.^The study An Evaluation Online Study on the Performance of SNAPE-PM is preregistered in the OSF-project TRR 318-A01 Adaptive Explanation Generationhttps://osf.io/htm5k.

10.^Conducted via Prolific www.prolific.com.

11.^OSF: Quarto Understanding https://doi.org/10.17605/OSF.IO/W39D.

References

1
AlagozO.HsuH.SchaeferA. J.RobertsM. S. (2010). Markov decision processes: a tool for sequential decision making under uncertainty. Med. Decis. Making30, 474–483. doi: 10.1177/0272989X09353194
2
AliS.AbuhmedT.El-SappaghS.MuhammadK.Alonso-MoralJ. M.ConfalonieriR.et al. (2023). Explainable artificial intelligence (XAI): what we know and what is left to attain trustworthy artificial intelligence. Inf. Fusion99:101805. doi: 10.1016/j.inffus.2023.101805
- CrossRef
- Google Scholar
3
AllwoodJ.NivreJ.AhlsénE. (1992). On the semantics and pragmatics of linguistic feedback. J. Semant. 9, 1–26. doi: 10.1093/jos/9.1.1
- CrossRef
- Google Scholar
4
AnjomshoaeS.NajjarA.CalvaresiD.FrämlingK. (2019). “Explainable agents and robots: results from a systematic literature review,” in Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS '19 (Montreal, QC: International Foundation for Autonomous Agents and Multiagent Systems), 1078–1088. doi: 10.65109/KCZB5817
- CrossRef
- Google Scholar
5
ArvanM.ValizadehM.HaghighatP.NguyenT.JeongH.PardeN.et al. (2023). “Linguistic cognitive load analysis on dialogues with an intelligent virtual assistant,” in Proceedings of the 45th Annual Conference of the Cognitive Science Society (Sydney, NSW).
- Google Scholar
6
AxelssonA.SkantzeG. (2023a). “Do you follow?: a fully automated system for adaptive robot presenters,” in Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction (New York, NY: ACM), 102–111. doi: 10.1145/3568162.3576958
- CrossRef
- Google Scholar
7
AxelssonA.SkantzeG. (2023b). “Using large language models for zero-shot natural language generation from knowledge graphs,” in Proceedings of the Workshop on Multimodal, Multilingual Natural Language Generation and Multilingual WebNLG Challenge (Prague: ACL), 39–54.
- Google Scholar
8
AxelssonN.SkantzeG. (2020). “Using knowledge graphs and behaviour trees for feedback-aware presentation agentsm,” in Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents (New York, NY: ACM), 1–8. doi: 10.1145/3383652.3423884
- CrossRef
- Google Scholar
9
BakkerB.ZivkovicZ.KroseB. (2005). “Hierarchical dynamic programming for robot path planning,” in 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems (Edmonton, AB: IEEE), 2756–2761. doi: 10.1109/IROS.2005.1545548
- CrossRef
- Google Scholar
10
BečkováI.PócošŠ.BelgiovineG.MatareseM.EldardeerO.SciuttiA.et al. (2025). A multi-modal explainability approach for human-aware robots in multi-party conversation. Comput. Vis. Image Underst. 253:104304. doi: 10.1016/j.cviu.2025.104304
- CrossRef
- Google Scholar
11
BenottiL.BlackburnP. (2021). “A recipe for annotating grounded clarifications,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, eds. ToutanovaK.RumshiskyA.ZettlemoyerL.Hakkani-TurD.BeltagyI.BethardS.et al. (ACL), 4065–4077. doi: 10.18653/v1/2021.naacl-main.320
- CrossRef
- Google Scholar
12
BertrandA.ViardT.BelloumR.EaganJ. R.MaxwellW. (2023). “On selective, mutable and dialogic XAI: a review of what users say about different types of interactive explanations,” in Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI '23 (New York, NY: ACM). doi: 10.1145/3544548.3581314
- CrossRef
- Google Scholar
13
Bravo-RoccaG. (2025). “Feature engineering for agents: an adaptive cognitive architecture for interpretable ML monitoring,” in Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems (Detroit, MI: IFAAMAS), 381–389. doi: 10.65109/SRBA1669
- CrossRef
- Google Scholar
14
BrennanS. E.GalatiA.KuhlenA. K. (2010). “Two minds, one dialog,” in Psychology of Learning and Motivation, Vol. 53 (Amsterda: Elsevier), 301–344. doi: 10.1016/S0079-7421(10)53008-1
- CrossRef
- Google Scholar
15
BrizanD. G.GoodkindA.KochP.BalaganiK.PhohaV. V.RosenbergA.et al. (2015). Utilizing linguistically enhanced keystroke dynamics to predict typist cognition and demographics. Int. J. Hum. Comput. Stud. 82, 57–68. doi: 10.1016/j.ijhcs.2015.04.005
- CrossRef
- Google Scholar
16
BuhlH. M.FisherJ. B.RohlfingK. J. (2024). “Changes in partner models – effects of adaptivity in the course of explanations,” in Proceedings of the Annual Meeting of the Cognitive Science Society (Rotterdam).
- Google Scholar
17
BulathwelaS.NiekerkD. V.ShiptonJ.Perez-OrtizM.RosmanB.Shawe-TaylorJ.et al. (2025). TrueReason: an exemplar personalised learning system integrating reasoning with foundational models. Arxiv [Preprint]. Available online at: https://arxiv.org/abs/2502.10411 (Accessed January 20, 2026).
- Google Scholar
18
BuschmeierH.KoppS. (2018). “Communicative listener feedback in human–agent interaction: artificial speakers need to be attentive and adaptive,” in Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2018) (Stockholm). doi: 10.65109/YPVT5468
- CrossRef
- Google Scholar
19
BuschmeierH.MaliszZ.WlodarczakM.KoppS.WagnerP. (2011). “‘Are you sure you're paying attention?' – ‘Uh-huh' communicating understanding as a marker of attentiveness,” in Interspeech 2011 (ISCA) (Florence), 2057–2060. doi: 10.21437/Interspeech.2011-540
- CrossRef
- Google Scholar
20
ÇelikokM. M.MurenaP.-A.KaskiS. (2023). Modeling needs user modeling. Front. Artif. Intell. 6:1097891. doi: 10.3389/frai.2023.1097891
21
ChandlerP.SwellerJ. (1991). Cognitive load theory and the format of instruction. Cogn. Instr. 8, 293–332. doi: 10.1207/s1532690xci0804_2
- CrossRef
- Google Scholar
22
ChandraK.ChenT.LiT.-M.Ragan-KelleyJ.TenenbaumJ. (2024). “Cooperative explanation as rational communication,” in Proceedings of the Annual Meeting of the Cognitive Science Society, Vol. 46 (Rotterdam). doi: 10.31234/osf.io/bmknu
- CrossRef
- Google Scholar
23
ChenY.BentonJ.RadhakrishnanA.UesatoJ.DenisonC.SchulmanJ.et al. (2025). Reasoning models don't always say what they think. Arxiv [Preprint]. Available online at: https://arxiv.org/abs/2505.05410 (Accessed January 20, 2026).
- Google Scholar
24
ChiM. T. H.RoyM.HausmannR. G. M. (2008). Observing tutorial dialogues collaboratively: insights about human tutoring effectiveness from vicarious learning. Cogn. Sci. 32, 301–341. doi: 10.1080/03640210701863396
25
ClarkH. H.Wilkes-GibbsD. (1986). Referring as a collaborative process. Cognition22, 1–39. doi: 10.1016/0010-0277(86)90010-7
26
DillenbourgP.LemaignanS.SanginM.NovaN.MolinariG. (2016). The symmetry of partner modelling. Int. J. Comput. Support Collab. Learn11, 227–253. doi: 10.1007/s11412-016-9235-5
- CrossRef
- Google Scholar
27
DucampG.GonzalesC.WuilleminP.-H. (2020). “aGrUM/pyAgrum : a toolbox to build models and algorithms for probabilistic graphical models in python,” in 10th International Conference on Probabilistic Graphical Models, Volume 138 of Proceedings of Machine Learning Research (Skorping), 609–612.
- Google Scholar
28
El-AssadyM.JentnerW.KehlbeckR.SchlegelU.SevastjanovaR.SperrleF.et al. (2019). “Towards XAI: structuring the processes of explanations,” in Proceedings of the ACM Workshop on Human-Centered Machine Learning, Glasgow, UK, Vol. 4 (New York, NY: ACM), 13.
- Google Scholar
29
EngonopoulosN.SayeedA.DembergV. (2013). “Language and cognitive load in a dual task environment,” in Proceedings of the Annual Meeting of the Cognitive Science Society, Vol. 35 (Berlin), 2148–2153.
- Google Scholar
30
FichtelL.SpliethöverM.HüllermeierE.JimenezP.KlowaitN.KoppS.et al. (2025). “Investigating co-constructive behavior of large language models in explanation dialogues,” in Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue, eds. BéchetF.LefèvreF.AsherN.KimS. and MerlinT. (Avignon: ACL), 1–20. doi: 10.64136/safy2489
- CrossRef
- Google Scholar
31
FisherJ. B.RobrechtA. S.KoppS.RohlfingK. J. (2023). “Exploring the semantic dialogue patterns of explanations – a case study of game explanations,” in Proceedings of the 27th Workshop on the Semantics and Pragmatics of Dialogue (SemDial) (Maribor, Slowenia).
- Google Scholar
32
FisherJ. B.WeiseM.NitschkeC.RohlfingK. J. (2024). ADEX Corpus A01. Data from: Open Science Framework: OSF. Available online at: https://osf.io/k5fwy (Accessed January 20, 2026).
- Google Scholar
33
GaoB.WangY.XieH.HuY.HuY. (2023). Artificial intelligence in advertising: advancements, challenges, and ethical considerations in targeting, personalization, content creation, and ad optimization. Sage Open13:21582440231210759. doi: 10.1177/21582440231210759
- CrossRef
- Google Scholar
34
Gemma TeamGoogle DeepMind. (2025). Gemma 3 technical report. arXiv [preprint]. arXiv:2503.19786. doi: 10.4850/arXiv.2503.19786
- CrossRef
- Google Scholar
35
GrimesG. M.ValacichJ. S. (2015). “Mind over mouse: the effect of cognitive load on mouse movement behavior,” in Proceedings of the Thirty Sixth International Conference on Information Systems (Fort Worth, TX), 1–13.
- Google Scholar
36
GroßA.RichterB.WredeB. (2025). SHIFT: an interdisciplinary framework for scaffolding human attention and understanding in explanatory tasks. arXiv [preprint]. arXiv:2503.16447. doi: 10.48550/arXiv:2503.16447
- CrossRef
- Google Scholar
37
GunningR. (1968). The Technique of Clear Writing. New York, NY: McGraw-Hill.
- Google Scholar
38
HermannE. (2022). Artificial intelligence and mass personalization of communication content—an ethical and literacy perspective. New Media Soc. 24, 1258–1277. doi: 10.1177/14614448211022702
- CrossRef
- Google Scholar
39
HonigS.Oron-GiladT. (2018). Understanding and resolving failures in human-robot interaction: literature review and model development. Front. Psychol. 9:861. doi: 10.3389/fpsyg.2018.00861
40
HostetterA. B.BahlS. (2023). Comparing the cognitive load of gesture and action production: a dual-task study. Lang. Cogn. 15, 601–621. doi: 10.1017/langcog.2023.23
- CrossRef
- Google Scholar
41
HuY.WangX.YaoW.LuY.ZhangD.ForooshH.et al. (2025). “DeFine: decision-making with analogical reasoning over factor profiles,” in Findings of the association for computational linguistics: ACL 2025 (Albuquerque, NM: ACL), 4587–4603. doi: 10.18653/v1/2025.findings-acl.238
- CrossRef
- Google Scholar
42
IdriziE. (2024). “Exploring the role of explainable artificial intelligence(xai) in adaptive learning systems,” in Proceedings of the Cognitive Models and Artificial Intelligence Conference, AICCONF '24 (New York, NY: ACM), 100–105. doi: 10.1145/3660853.3660877
- CrossRef
- Google Scholar
43
KeilF. C. (2006). Explanation and understanding. Annu. Rev. Psychol. 57, 227–254. doi: 10.1146/annurev.psych.57.102904.190100
- CrossRef
- Google Scholar
44
KhawajaM. A.ChenF.MarcusN. (2014). Measuring cognitive load using linguistic features: implications for usability evaluation and adaptive interaction design. Int. J. Hum. Comput. Interact. 30, 343–368. doi: 10.1080/10447318.2013.860579
- CrossRef
- Google Scholar
45
KunzJ.KuhlmannM. (2024). “Properties and challenges of LLM-generated explanations,” in Proceedings of the Third Workshop on Bridging Human-Computer Interaction and Natural Language Processing (Mexico City: ACL), 13–27. doi: 10.18653/v1/2024.hcinlp-1.2
- CrossRef
- Google Scholar
46
LewisD. (1986). Causal explanation. Philosphical Pap. 2, 214–240. doi: 10.1093/0195036468.003.0007
- CrossRef
- Google Scholar
47
LimJ.KangM.KimJ.KimJ.HurY.LimH.et al. (2023). “Beyond candidates : adaptive dialogue agent utilizing persona and knowledge,” in Findings of the Association for Computational Linguistics: EMNLP 2023 (Singapore: ACL), 7950–7963. doi: 10.18653/v1/2023.findings-emnlp.534
- CrossRef
- Google Scholar
48
LiuS.ZhengC.DemasiO.SabourS.LiY.YuZ.et al. (2021). “Towards emotional support dialog systems,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), eds. ZongC.XiaF.LiW. and NavigliR. (ACL), 3469–3483. doi: 10.18653/v1/2021.acl-long.269
- CrossRef
- Google Scholar
49
LombrozoT. (2006). The structure and function of explanations. Trends Cogn. Sci. 10, 464–470. doi: 10.1016/j.tics.2006.08.004
50
LombrozoT. (2016). Explanatory preferences shape learning and inference. Trends Cogn. Sci. 20, 748–759. doi: 10.1016/j.tics.2016.08.001
51
LuY.HuY.ForooshH.JinW.LiuF. (2025). “STRUX: an LLM for decision-making with structured explanations,” in Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 2 (Albuquerque, NM: ACL), 131–141. doi: 10.18653/v1/2025.naacl-short.11
- CrossRef
- Google Scholar
52
LubosS.TranT. N. T.FelfernigA.Polat ErdenizS.LeV.-M. (2024). “Llm-generated explanations for recommender systems,” in Adjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization, UMAP Adjunct '24 (New York, NY: ACM), 276–285. doi: 10.1145/3631700.3665185
- CrossRef
- Google Scholar
53
LuoB.ZhangY.DubeyA.MukhopadhyayA. (2024). “Act as you learn: adaptive decision-making in non-stationary markov decision processes,” in Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, AAMAS '24 (Auckland: International Foundation for Autonomous Agents and Multiagent Systems), 1301–1309. doi: 10.65109/VBOB2771
- CrossRef
- Google Scholar
54
MacNeilS.TranA.HellasA.KimJ.SarsaS.DennyP.et al. (2023). “Experiences from using code explanations generated by large language models in a web software development e-book,” in Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1, SIGCSE 2023 (New York, NY: ACM), 931–937. doi: 10.1145/3545945.3569785
- CrossRef
- Google Scholar
55
MastersP. (2025). “Rethinking explainable ai: explanations can be deceiving,” in Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems (Detroit, MI: IFAAMAS), 2663–2664. doi: 10.65109/QYNF7337
- CrossRef
- Google Scholar
56
MillerT. (2019). Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38. doi: 10.1016/j.artint.2018.07.007
- CrossRef
- Google Scholar
57
MurphyK. P.RussellS. (2002). Dynamic Bayesian Networks: Representation, Inference and Learning [PhD thesis]. University of California, Berkeley, CA.
- Google Scholar
58
NardiL.StachnissC. (2019). “Uncertainty-aware path planning for navigation on road networks using augmented MDPs,” in Proceedings of the International Conference on Robotics and Automation 2019 (Montreal, QC: IEEE), 5780–5786. doi: 10.1109/ICRA.2019.8794121
- CrossRef
- Google Scholar
59
NewmanB. A. (2024). “Bootstrapping linear models for fast online adaptation in human-agent collaboration,” in Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems (Auckland: IFAAMAS), 1463–1472. doi: 10.65109/WMMN2492
- CrossRef
- Google Scholar
60
NezhurinaM.Cipolina-KunL.ChertiM.JitsevJ. (2025). Alice in wonderland: simple tasks showing complete reasoning breakdown in state-of-the-art large language models. arXiv [preprint]. arXiv:2406.02061. doi: 10.48550/arXiv.2406.02061
- CrossRef
- Google Scholar
61
NofshinE. (2024). “Leveraging interpretable human models to personalize AI interventions for behavior change,” in Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems (Auckland: IFAAMAS), 2761–2763. doi: 10.65109/YTYW1307
- CrossRef
- Google Scholar
62
NooraniE. (2025). “Counterfactual explanations for model ensembles using entropic risk measures,” in Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems (Detroit, MI: IFAAMAS), 1566–1575. doi: 10.65109/LAGC1207
- CrossRef
- Google Scholar
63
OppenheimerD. M.MeyvisT.DavidenkoN. (2009). Instructional manipulation checks: detecting satisficing to increase statistical power. J. Exp. Soc. Psychol. 45, 867–872. doi: 10.1016/j.jesp.2009.03.009
- CrossRef
- Google Scholar
64
PerkovicG.DrobnjakA.BotickiI. (2024). “Hallucinations in LLMs: understanding and addressing challenges,” in 2024 47th MIPRO ICT and Electronics Convention (MIPRO) (Opatija: IEEE), 2084–2088. doi: 10.1109/MIPRO60963.2024.10569238
- CrossRef
- Google Scholar
65
PettetA.ZhangY.LuoB.WrayK.BaierH.LaszkaA.et al. (2024). “Decision making in non-stationary environments with policy-augmented search,” in Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, AAMAS '24 (Auckland: International Foundation for Autonomous Agents and Multiagent Systems), 2417–2419. doi: 10.65109/AFZR1425
- CrossRef
- Google Scholar
66
PiriyakulkijW. T.KuleshovV.EllisK. (2024). Active Preference Inference Using Language Models and Probabilistic Reasoning. doi: 10.48550/arXiv.2312.12009
- CrossRef
- Google Scholar
67
PutermanM. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics, 1st Edn. Hoboken, NJ: Wiley. doi: 10.1002/9780470316887
- CrossRef
- Google Scholar
68
RayD. G.NeugebauerJ.SassenbergK.BuderJ.HesseF. W. (2013). Motivated shortcomings in explanation: the role of comparative self-evaluation and awareness of explanation recipient's knowledge. J. Exp. Psychol. Gen. 142, 445–457. doi: 10.1037/a0029339
69
ReddyG. P.Pavan KumarY. V.PrakashK. P. (2024). “Hallucinations in large language models (LLMs),” in 2024 IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream) (Vilnius: IEEE), 1–6. doi: 10.1109/eStream61684.2024.10542617
- CrossRef
- Google Scholar
70
RobrechtA.KoppS. (2023). “SNAPE: a sequential non-stationary decision process model for adaptive explanation generation,” in Proceedings of the 15th International Conference on Agents and Artificial Intelligence (Lisbon: SCITEPRESS - Science and Technology Publications), 48–58. doi: 10.5220/0011671300003393
- CrossRef
- Google Scholar
71
RobrechtA. S.RothgängerM.KoppS. (2023). “A study on the benefits and drawbacks ofadaptivity in AI-generated explanations,” in Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents (New York, NY: ACM). doi: 10.1145/3570945.3607339
- CrossRef
- Google Scholar
72
RohlfingK. J.CimianoP.ScharlauI.MatznerT.BuhlH. M.BuschmeierH.et al. (2021). Explanation as a social practice: toward a conceptual framework for the social design of AI systems. IEEE Trans. Cogn. Dev. Syst. 13, 717–728. doi: 10.1109/TCDS.2020.3044366
- CrossRef
- Google Scholar
73
RussellS. (1999). “Rationality and intelligence,” in Foundations of Rational Agency, Volume 14, Series Title: Applied Logic Series, eds. GabbayD. M.BarwiseJ.WooldridgeM. and RaoA. (Cham: Springer Netherlands), 11–33. doi: 10.1007/978-94-015-9204-8_2
- CrossRef
- Google Scholar
74
ScherrerB.LesnerB. (2012). “On the use of non-stationary policies for stationary infinite-horizon Markov decision processes,” in Advances in Neural Information Processing Systems 25 (Lake Tahoe, CA).
- Google Scholar
75
SchmiderE.ZieglerM.DanayE.BeyerL.BühnerM. (2010). Is it really robust?: Reinvestigating the robustness of ANOVA against violations of the normal distribution assumption. Methodology6, 147–151. doi: 10.1027/1614-2241/a000016
- CrossRef
- Google Scholar
76
ScontrasG.TesslerM. H.FrankeM. (2016). Probabilistic Language Understanding: An Introduction to the Rational Speech Act Framework. Available online at: https://www.problang.org/ (Accessed January 20, 2026).
- Google Scholar
77
ShojaeeP.MirzadehI.AlizadehK.HortonM.BengioS.FarajtabarM.et al. (2025). The illusion of thinking: understanding the strengths and limitations of reasoning models via the lens of problem complexity. arxiv [Preprint]. doi: 10.70777/si.v2i6.15919
- CrossRef
- Google Scholar
78
SinghA.RohlfingK. J. (2024). “Coupling of task and partner model: investigating the intra-individual variability in gaze during human–robot explanatory dialogue,” in Companion Proceedings of the 26th International Conference on Multimodal Interaction, ICMI '24 Companion (New York, NY: ACM), 218–224. doi: 10.1145/3686215.3689202
- CrossRef
- Google Scholar
79
SokolK.FlachP. (2020). One explanation does not fit all: the promise of interactive explanations for machine learning transparency. KI - Künstliche Intelligenz34, 235–250. doi: 10.1007/s13218-020-00637-y
- CrossRef
- Google Scholar
80
SreedharanS.ChakrabortiT.KambhampatiS. (2021). Foundations of explanations as model reconciliation. Artif. Intell. 301:103558. doi: 10.1016/j.artint.2021.103558
- CrossRef
- Google Scholar
81
SrinivasanR.ChanderA. (2021). “Explanation perspectives from the cognitive sciences—a survey,” in Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (Yokohama: IJCAI'20). doi: 10.24963/ijcai.2020/670
- CrossRef
- Google Scholar
82
van BaalenS.BoonM.VerhoefP. (2021). From clinical decision support to clinical reasoning support systems. J. Eval. Clin. Pract. 27, 520–528. doi: 10.1111/jep.13541
83
VasileiouS. L.YeohW. (2023). “Please: generating personalized explanations in human-aware planning,” in European Conference on Artificial Intelligence (Krakow). doi: 10.3233/FAIA230543
- CrossRef
- Google Scholar
84
WestraE.NagelJ. (2021). Mindreading in conversation. Cognition210:104618. doi: 10.1016/j.cognition.2021.104618
85
YangZ. (2024). “Risk-aware constrained reinforcement learning with non-stationary policies,” in Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems (Auckland: IFAAMAS), 2029–2037. doi: 10.65109/ZRHO4945
- CrossRef
- Google Scholar
86
ZhaoM.SimmonsR.AdmoniH. (2025). The role of adaptation in collective human–AI teaming. Top. Cogn. Sci. 17, 291–323. doi: 10.1111/tops.12633

Appendix

(1) German original and English translation taken from VP08_ts837:
- a. ja ich dachte vorher okay ich dachte vorher vorher hab ich's nicht verstanden. Weil ich dachte es gäb nur lange Steine und dann kurze und dann.
- b. yes I thought before okay I thought before I did not understand. Because I thought it existed only long stones, then short, then.

Word count: the full amount of words in a text.
Type-token-ratio: the relation between the wordforms in a text (token) and the amount of basic lexems/words.
Gunning fog index: the readability of a text (17: college graduate, 12: high school senior, 6: sixth grade). A word is considered to be complex if it consists of three or more syllables (suffixes are not included).

Summary

Keywords

adaptivity, dynamic Bayesian network, explainability, human evaluation, non-stationary decision process, partner models, user study

Citation

Robrecht-Hilbig AS, Kowalski C and Kopp S (2026) Generation and evaluation of adaptive explanations based on dynamic partner-modeling and non-stationary decision making. Front. Comput. Sci. 8:1558674. doi: 10.3389/fcomp.2026.1558674

Received

17 December 2025

Revised

28 November 2025

Accepted

12 January 2026

Published

19 February 2026

Volume

8 - 2026

Edited by

Fabrizio Lombardi, ETH Zürich, Switzerland

Reviewed by

Francesco Monaco, Azienda Sanitaria Locale Salerno, Italy

Akihiro Maehigashi, Shizuoka University, Japan

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Amelie S. Robrecht-Hilbig, arobrecht@techfak.uni-bielefeld.de

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Human-Media Interaction

TECHNOLOGY AND CODE article

Generation and evaluation of adaptive explanations based on dynamic partner-modeling and non-stationary decision making

Abstract

1 Introduction

2 Related work

2.1 Explanations in human–human interaction