Weaponizing cognitive bias in autonomous systems: a framework for black-box inference attacks

Chu, Shiyong; Chen, Yuwei

doi:10.3389/frai.2025.1623573

ORIGINAL RESEARCH article

Front. Artif. Intell., 20 August 2025

Sec. Language and Computation

Volume 8 - 2025 | https://doi.org/10.3389/frai.2025.1623573

Weaponizing cognitive bias in autonomous systems: a framework for black-box inference attacks

Shiyong Chu^†

Yuwei Chen^*^†

Aviation Industry Development Research Center of China, Beijing, China

Autonomous systems operating in high-dimensional environments increasingly rely on prioritization heuristics to allocate attention and assess risk, yet these mechanisms can introduce cognitive biases such as salience, spatial framing, and temporal familiarity that influence decision-making without altering the input or accessing internal states. This study presents Priority Inversion via Operational Reasoning (PRIOR), a black-box, non-perturbative diagnostic framework that employs structurally biased but semantically neutral scenario cues to probe inference-level vulnerabilities without modifying pixel-level, statistical, or surface semantic properties. Given the limited accessibility of embodied vision-based systems, we evaluate PRIOR using large language models (LLMs) as abstract reasoning proxies to simulate cognitive prioritization in constrained textual surveillance scenarios inspired by Unmanned Aerial Vehicle (UAV) operations. Controlled experiments demonstrate that minimal structural cues can consistently induce priority inversions across multiple models, and joint analysis of model justifications and confidence estimates reveals systematic distortions in inferred threat relevance even when inputs are symmetrical. These findings expose the fragility of inference-level reasoning in black-box systems and motivate the development of evaluation strategies that extend beyond output correctness to interrogate internal prioritization logic, with implications for dynamic, embodied, and visually grounded agents in real-world deployments.

1 Introduction

Autonomous systems are increasingly entrusted with high-stakes decision-making in dynamic, visually complex environments. From unmanned aerial vehicles conducting real-time urban patrols to robotic agents deployed in disaster response and AI systems managing critical infrastructure, these technologies are expected to reason, prioritize, and act, often faster and at greater scale than human operators (Rezwan and Choi, 2022; Cheng et al., 2023; Papyan et al., 2024; Khan et al., 2022). Underpinning this capability is a widely held assumption: algorithmic reasoning is inherently more stable, consistent, and impartial than its human counterpart (Bruza and Hoenkamp, 2018; Frischknecht, 2021; Deng et al., 2022).

Yet this assumption is increasingly being challenged. As these systems become more sophisticated, their outputs begin to reflect reasoning shortcuts shaped by training distributions, interaction history, and environmental constraints (Raja et al., 2022; Liu et al., 2020a; Wang and Gursoy, 2023). Following the foundational work by Tversky and Kahneman (Tversky and Kahneman, 1974), we distinguish between heuristics, which are fast, intuitive rules used for navigating uncertainty, and biases, defined as systematic deviations from optimal reasoning that may result from heuristic overapplication. The term “structural tendencies” is used to describe emergent behavior patterns driven by architecture, training, or environmental regularities. The PRIOR framework focuses on cases where heuristics, such as “novelty signals risk” or “spatial edge implies intent”, produce systematic bias in attention allocation. These are not perceptual glitches but predictable inference patterns. Just as human judgment is susceptible to perceptual salience (Mullen et al., 1992; Borgida and Howard-Pitney, 1983), spatial framing (Entman, 2007; Knierim and Hamilton, 2011; Morstatter et al., 2018), and temporal familiarity (Zvielli et al., 2015; Vrac and Friederichs, 2015), AI agents may develop analogous sensitivities, prioritizing inputs that are visually conspicuous, structurally distinct, or newly introduced, even when they are not actually relevant to the task at hand (Cummings, 2017; De Oliveira and Levkowitz, 2003; Bayoudh et al., 2022; Farrer et al., 2008). These tendencies rarely present as perception failures; rather, they manifest as misaligned prioritization, in which systems attend to elements that appear urgent, unfamiliar, or peripheral regardless of their true relevance (Danks and London, 2017; Blumenthal-Barby, 2016; Frank et al., 2019). Although such heuristics may be computationally efficient in complex environments, they also introduce exploitable weaknesses when adversaries embed misleading but structurally salient cues. As illustrated in Figure 1, structurally embedded features such as salience, positioning, or temporal recurrence may induce cognitively plausible yet operationally misaligned reasoning patterns in autonomous systems.

Figure 1

Box plot showing model confidence scores for Salience, Spatial, Temporal, and Compound bias conditions. Each category is represented by a colored box with whiskers indicating data variability. The Compound condition shows the highest median and the smallest spread.

Figure 1. Conceptual overview of visual bias induction.

Unlike conventional adversarial attacks that manipulate input pixels or prompts, PRIOR is a non-perturbative strategy that avoids modifying input content at the statistical, semantic, or perceptual level. Instead, it targets inference by embedding high-level structural cues, such as spatial positioning, salience triggers, or recurrence patterns, into otherwise plausible scenes. For example, a surveillance system that consistently prioritizes individuals standing near structural features such as lampposts over those in open spaces, or that monitors newly encountered subjects while overlooking familiar threats, may not misclassify but instead misallocate strategic weight. Such behaviors reflect inference vulnerabilities rather than sensory or classification errors. These vulnerabilities often go unrecognized in standard evaluation pipelines, which focus on perception fidelity or classification robustness (Peng, 2018; Wang et al., 2020, 2022; Rastogi et al., 2022).

In this paper, autonomous systems are used as a motivating context, but no evaluation is conducted on real-time, vision-embedded agents. Instead, decision logic is simulated using large language models (LLMs) as abstract proxies. While these models do not replicate embodied cognition, persistent memory, or environmental dynamics, they provide a controllable platform for observing how structured heuristic cues influence attention and prioritization in text-based decision scenarios. Importantly, the PRIOR framework does not require access to internal reasoning steps; instead, it infers behavioral tendencies solely from model outputs.

This insight motivates our proposed framework, PRIOR, which is a black-box, non-perturbative adversarial strategy that induces inference drift by exploiting cognitively loaded but semantically neutral scene structures. We do not interpret the observed model outputs as proof of internal reasoning bias, but as evidence of output-level alignment with human-like heuristic traps, which may mirror vulnerabilities in future autonomous agents.

The primary contributions of this study are as follows:

• A conceptual distinction is established between heuristics, biases, and structural tendencies, and a taxonomy of inference-level vulnerabilities is proposed based on cognitive shortcut activation.

• The PRIOR framework is introduced as a novel adversarial method that bypasses perceptual perturbations and model internals, focusing instead on inference-level manipulation.

• Consistent priority inversions in response to structured heuristic triggers are empirically demonstrated using LLMs as decision-making proxies across controlled visual-scenario prompts.

• It is argued that trustworthiness assessments must move beyond accuracy or fairness audits and toward robustness under cognitively realistic reasoning conditions.

The remainder of this paper is organized as follows: Section 2 reviews related work on cognitive bias and adversarial reasoning. Section 3 presents a taxonomy of cognitive vulnerabilities. Section 4 outlines the PRIOR framework. Section 5 provides empirical evaluation results. Section 6 discusses broader implications.

2 Related work

2.1 Cognitive bias in autonomous systems

Cognitive biases have long been studied in behavioral science as efficient heuristics for navigating uncertainty, often at the cost of systematic misjudgment (Haselton et al., 2015; Barnes Jr, 1984). Salience, familiarity, and spatial framing are not failures of cognition, but optimized trade-offs under constraints of limited time and attention. Heuristics refer to cognitively economical rules of thumb that enable rapid decision-making; biases arise when these heuristics systematically deviate from rational inference or task-aligned objectives (Suresh and Guttag, 2019). As autonomous systems increasingly emulate human reasoning, particularly in unstructured or dynamic environments, they too exhibit analogous biases. These biases arise not from explicit design, but from emergent properties shaped by data distributions, reward structures, and operational feedback loops (Danks and London, 2017; Blumenthal-Barby, 2016; Frank et al., 2019; Wang et al., 2020). These tendencies rarely lead to outright perceptual failure; instead, they reveal deeper structural patterns in how agents construct situational relevance and allocate cognitive resources.

While research on bias in AI has expanded rapidly, most efforts have focused on fairness metrics, data imbalance, or output disparities. These issues are typically examined in the context of language models, recommendation engines, or classification-based predictors (He et al., 2021; Wang et al., 2022; Mehrabi et al., 2021). Far less attention has been given to how cognitive biases manifest in autonomous agents that must act rather than merely infer. In domains such as UAV surveillance or real-time urban patrolling, agents must continuously decide what to observe, when to respond, and where to allocate limited attention. It is within these operational hierarchies that cognitive shortcuts emerge. These shortcuts do not lead to misclassifications, but rather to systematic misallocations of attention and action (Rastogi et al., 2022; Banerjee et al., 2023). This shift from passive inference to situated action introduces new bias failure modes with ethical and tactical consequences, as highlighted in critical studies of autonomous weapon systems and AI risk (Asaro, 2019; Limata, 2023; Hellström et al., 2020). Despite their relevance to mission-critical performance, such biases remain under-characterized in prevailing AI safety frameworks. PRIOR addresses this gap not by mitigating bias, but by probing how heuristic-driven reasoning can be predictably destabilized under semantically neutral yet structurally adversarial conditions.

2.2 Visual reasoning shortcuts and non-perturbative vulnerabilities

Adversarial research in computer vision has traditionally focused on input-level perturbations, such as pixel-level noise, adversarial patches, or geometric distortions designed to compromise classifier accuracy (Guo et al., 2023; Liu et al., 2023). These approaches highlight the fragility of perception modules under minimal, targeted manipulation. However, real-world agents deployed in surveillance or navigation scenarios must reason over time, operate under uncertainty, and often lack interpretable internal states. In these contexts, failures frequently arise not solely from misperception, but from heuristic inference shortcuts, which are context-sensitive strategies that compress complexity into plausible priors (Buçinca et al., 2021; Mukherjee and Chang, 2024; Bertrand et al., 2022). Some of these heuristics are beneficial: for instance, prioritizing unfamiliar inputs (“novelty signals risk”) or edge-positioned actors (“spatial edge implies intent”) may improve responsiveness in constrained environments (Hellström et al., 2020; Limata, 2023). However, when triggered by semantically neutral cues, they can induce confident yet misaligned prioritization decisions. Moreover, what appears as bias may sometimes arise from architectural limitations, such as short context windows, missing relational memory, or statistical artifacts from training corpora (Mehrabi et al., 2021; Suresh and Guttag, 2019).

Such vulnerabilities are difficult to detect using conventional adversarial protocols, as these agents often function as black boxes, with no gradient access or interpretable reasoning traces (Marinucci et al., 2023; Lin et al., 2021; Johnson, 2021). Unlike perceptual attacks that disrupt classification accuracy, heuristic failures preserve surface plausibility while subtly distorting the agent's internal prioritization logic (De Houwer, 2019). Rather than injecting noise or breaking semantics, an adversary may manipulate visual structure to guide inference under cognitively familiar but operationally misleading patterns. This calls for an expanded understanding of vulnerability: not just as perceptual error, but as inference distortion under semantically coherent input.

2.3 Trustworthiness evaluation beyond accuracy and fairness

Trustworthiness in AI has traditionally been assessed through quantifiable metrics, such as accuracy under distribution shift, robustness to adversarial inputs, and fairness across demographic groups. These dimensions have significantly advanced our understanding of model reliability. Yet for autonomous agents operating in real-world, uncertain environments, such metrics remain insufficient (Albahri et al., 2023; Kaur et al., 2022). Autonomous systems differ from automated ones not only in complexity but in the degree of interpretive freedom they exercise during task execution. Accuracy may indicate perceptual correctness, but it fails to capture whether the system has allocated its attention to semantically relevant cues. Similarly, fairness metrics may detect demographic parity but not operational soundness under dynamic prioritization demands. In mission-critical tasks like UAV surveillance or disaster response, models must not only perceive correctly but also prioritize effectively. A system that assigns higher priority to irrelevant stimuli over mission-critical elements, even if it achieves high classification accuracy, can still fail operationally (He et al., 2021). Misallocated attention, delayed intervention, or misplaced focus may compromise the very objectives these systems are designed to fulfill.

Despite this, existing evaluation frameworks offer limited insight into reasoning instability, particularly when it arises from structurally coherent but semantically misleading inputs (Mukherjee and Chang, 2024; Liu et al., 2020b; Bertrand et al., 2022). Agents operating under bandwidth or attention constraints often rely on heuristics such as novelty, proximity, or repetition to allocate focus. While computationally efficient, these shortcuts are inherently brittle. To evaluate trustworthiness in such agents, we require new diagnostic dimensions: (1) Reasoning Robustness: the stability of prioritization logic under cognitively adversarial conditions; (2) Inference Alignment: the degree to which internal justifications match task intent; and (3) Confidence Plausibility: whether expressed certainty correlates with semantic integrity rather than superficial cues. These dimensions emphasize process-level reliability over outcome-level metrics and can reveal covert vulnerabilities masked by traditional evaluations. This calls for a new class of diagnostic tools, not only to audit fairness or output consistency, but also to assess whether an agent's inferential processes can remain robust under cognitive stress. In this view, bias is not merely a social concern, but a structural probe into the limits of operational trust.

3 A taxonomy of cognitive biases in autonomous systems

3.1 Cognitive bias as an attack surface

Autonomous systems are increasingly deployed in high-stakes decision-making contexts, ranging from urban surveillance to disaster response and operational planning. As their architectures evolve from rule-based execution toward cognitively inspired reasoning pipelines, these systems gain flexibility, but also inherit new classes of vulnerabilities. Among them, cognitive biases represent a subtle yet structurally exploitable attack surface that remains underexplored in adversarial AI.

Bias in autonomous systems, which was traditionally regarded as a form of human error emerging from heuristic shortcuts or processing constraints, is often framed in terms of fairness, alignment, or demographic parity. Yet under adversarial conditions, particularly in black-box settings, these biases may become deliberately inducible. Rather than inefficiencies to be eliminated, cognitive biases can serve as adversarial levers that are capable of redirecting agent behavior without perturbing inputs or accessing internal model representations.

While much of the literature focuses on pixel-level or embedding-space perturbations, our work targets reasoning-level vulnerabilities that emerge when visual context guides inference. Here, “non-perturbative” specifically indicates that the input scenes are not modified at pixel-level, statistical, or semantic levels. Due to the limited availability of publicly accessible vision-language autonomous systems capable of full-scene prioritization, we adopt LLMs as abstract reasoning proxies. It is important to clarify that these LLM-based surrogates process controlled textual descriptions of visual scenarios embedded with cognitive bias cues, thereby allowing us to isolate reasoning failures purely at the inference stage, without the confounding influence of perceptual noise or real-time dynamics present in actual autonomous systems.

To investigate this class of vulnerability, we adopt a task-oriented perspective: which cognitive biases can be externally triggered, behaviorally observed, and operationally exploited? Rather than importing psychological taxonomies wholesale, we define a functional classification grounded in adversarial manipulation potential.

We formalize this framing along three operational dimensions:

• Controllability: the degree to which a bias can be reliably activated via scenario construction or input configuration;

• Observability: the extent to which bias activation yields detectable shifts in agent behavior;

• Exploitability: the potential for a bias to disrupt mission-critical decision logic or attention allocation.

These dimensions form the analytical foundation of our bias taxonomy, which is specifically tailored to abstract reasoning agents responding to textual simulations of visual scenarios under black-box constraints. Although this study relies on LLM-based reasoning proxies, the taxonomy's conceptual foundation is designed to inform future experiments involving vision-language models and realistic closed-loop autonomous agents.

3.2 Dimensions of cognitive exploitability

In black-box scenarios where internal model states are inaccessible, adversarial manipulation must operate through externally observable channels. When reframed as functional failure modes, cognitive biases offer precisely such an interface. To operationalize this, we analyze exploitability along three core dimensions: controllability, observability, and exploitability, as introduced in Section 3.1.

In practice, Controllability determines whether biases can be systematically and reliably activated by structured yet semantically neutral scenario constructions. In the context of our experiments using LLM-based proxies, controllability involves crafting textual simulations that contain consistent heuristic triggers, such as salient textual cues, spatial descriptions, or repeated temporal patterns, while preserving overall task plausibility.

Observability addresses the detectability of induced cognitive biases through measurable shifts in agent behavior. Since internal reasoning states are inaccessible in black-box settings, observability relies entirely on externally measurable behaviors, such as agents' decisions or priority assignments. In our current text-based experimental setup, observability is assessed by systematically tracking agents' priority decisions across controlled pairs of scenario descriptions differing only in bias-relevant details.

Exploitability captures the practical impact of biases on operational effectiveness. It evaluates whether activated biases lead to critical misjudgments or operational failures. Although our current experiments rely solely on abstract textual simulations mimicking UAV-style surveillance tasks, exploitability assessments consider hypothetical mission-critical implications, such as the misallocation of surveillance resources or attention in high-stakes situations.

Collectively, these three dimensions allow for a structured evaluation of cognitive biases as actionable adversarial vulnerabilities, providing the analytical foundation for scenario design and bias selection. While the current study's assessments are inherently limited by the use of LLM-based abstract reasoning proxies, this dimensional framework is intended to be directly applicable and extendable to future empirical evaluations involving realistic, vision-driven autonomous systems.

3.3 A typology of exploitable biases in vision-driven autonomous systems

To operationalize our framework, we introduce a typology of visual cognitive biases that are structurally grounded and adversarially exploitable. We specifically consider vision-only systems, such as UAV-based agents, that rely exclusively on image inputs to make prioritization and response decisions. It is important to clarify that while heuristics guiding visual attention allocation are generally adaptive and efficient, under adversarial conditions, they can systematically lead to exploitable cognitive biases. Our typology reflects three specific categories: salience-oriented biases, spatial framing biases, and temporal persistence biases, selected based on their prominence and relevance in preliminary textual simulations conducted with abstract reasoning proxies.

Each bias category corresponds to a distinct mechanism through which visual structure can reshape task-level reasoning. All three satisfy the criteria of controllability, observability, and exploitability, forming a minimally sufficient basis for targeted behavioral manipulation in black-box environments. The following subsections outline each bias type in detail, explicitly clarifying the heuristic's general functionality, conditions under which it can become a vulnerability, and illustrative examples grounded in operational scenarios.

3.3.1 Salience-oriented perceptual biases

Salience biases occur when agents disproportionately attend to features that are visually prominent but semantically irrelevant. Typically, vision-only systems detect and prioritize salient elements, such as individuals wearing high-contrast clothing or isolated postures, as preliminary indicators of potential threat. This heuristic is functionally adaptive, as it quickly directs attention toward potentially important elements in visually complex environments. However, it becomes a vulnerability when adversaries intentionally introduce semantically neutral yet visually prominent distractors.

These biases frequently emerge from statistical associations learned during training, where visual prominence weakly but consistently correlates with risk. Over time, these associations override deeper contextual reasoning, shifting the decision basis from behavior (“acting anomalously”) to appearance (“looking anomalous”). In our abstract text-based simulations, we illustrate this vulnerability by explicitly describing scenarios where individuals wearing distinctive but benign attire (e.g., a high-contrast T-shirt) consistently receive higher priority despite displaying no threatening behaviors.

Salience-based biases exhibit high controllability, as adversaries can easily embed visually prominent features into real-world environments. Observability is moderate, typically manifesting as shifts in attention allocation. Exploitability is substantial in operational contexts where attention diversion leads to critical oversight, such as resource-constrained surveillance. For instance, a UAV patrol scenario may prioritize a visually distinct yet harmless individual, inadvertently neglecting a more subtle but genuine threat.

3.3.2 Spatial context misalignment biases

Spatial biases arise when agents over-rely on environmental geometry, inferring threat or risk primarily based on positional cues rather than observed behaviors. For instance, vision-based agents often interpret proximity to architectural features (e.g., alleyways, lamp posts, or structural edges) as implicit indicators of suspicious intent. This spatial heuristic is pragmatically useful because such positions often correlate with concealment or ambush strategies, but it becomes problematic when agents interpret geometry alone without corroborating behavioral evidence.

Spatial biases frequently develop due to uneven data distribution, where training scenarios disproportionately represent edge locations as risky or anomalous. Consequently, agents form strong priors linking positional marginality or occlusion to suspiciousness. In our textual experiments, we simulate this bias by consistently describing neutral actors positioned near environmental boundaries or partial occluders (e.g., adjacent to lampposts), thereby demonstrating systematic misallocation of surveillance priority despite symmetrical behavioral conditions.

Controllability is moderate, as adversaries cannot change physical geometry but can readily manipulate the positioning of individuals. Observability is high, seen clearly in altered patrol decisions or shifts in monitoring behavior. Exploitability is considerable, especially in crowded or complex urban scenarios where geometric heuristics supplant detailed semantic processing. For example, in crowded plaza surveillance, an individual innocently standing near a peripheral structure may attract disproportionate surveillance resources, thereby leaving central areas inadequately monitored.

3.3.3 Temporal persistence biases in visual sequences

Temporal biases emerge when agents overly trust locations or actors based solely on historical, benign interactions or exposure. If a particular location repeatedly shows no anomaly over time, agents may implicitly lower their vigilance, even in the face of subsequent subtle threats. Temporal persistence heuristics serve to optimize limited monitoring resources by reducing redundant attention to historically benign areas. However, adversaries can exploit this learned complacency by first establishing normality, subsequently enabling threats to emerge unnoticed.

Such biases frequently result from reinforcement-based learning patterns in autonomous agents, where recurrent non-threatening experiences gradually reduce attention to repeated stimuli. This implicit trust-building process forms blind spots. In our controlled textual scenarios, we illustrate temporal bias by describing agents repeatedly encountering benign, identical actors across multiple patrol cycles, subsequently failing to promptly identify or prioritize novel yet subtle threats appearing in the same region.

Temporal biases have moderate controllability, requiring consistent preconditioning to establish baseline trust. Observability is high, as evident in explicit changes in surveillance frequency. Exploitability is significant in persistent security contexts, enabling stealthy infiltration or delayed-action threats. For instance, repeated benign activity in a specific sector could lead a UAV surveillance agent to deprioritize monitoring there, providing adversaries an opportunity to conduct undetected activities subsequently.

3.4 Evaluating visual biases for adversarial suitability

Each of the bias types described above offers a distinct pathway for adversarial influence. Using the controllability-observability-exploitability framework, we assess their respective threat profiles within vision-only, black-box systems. It is critical to note, however, that the following threat assessments are derived from abstract textual simulations employing LLM-based reasoning proxies, rather than direct empirical tests with actual vision-driven autonomous systems.

Salience biases are the most controllable, requiring minimal effort to embed into real-world visual scenarios. Their activation is moderately observable and highly exploitable in time-critical environments. As such, they represent the most accessible entry point for black-box adversarial manipulation. In practical terms, adversaries could easily introduce visually prominent yet benign distractors in operational contexts, though the current conclusion remains preliminary and must be validated with empirical visual inputs.

Spatial biases demand more precise alignment with environmental geometry. While less directly controllable, their behavioral manifestations are clearly observable and often operationally significant, particularly in dense or structurally complex scenes. However, since the present evaluations rely solely on scenario descriptions rather than actual spatial visual data, future empirical studies using genuine vision-language models are necessary to confirm the exploitability of these biases.

Temporal biases necessitate long-term exposure and strategic scene conditioning. Although initially less controllable, they enable potent delayed-action exploitation by modifying the agent's internal vigilance landscape. Yet, as our current conclusions about temporal biases are drawn exclusively from simulated textual experiments, further validation in dynamic, temporally evolving visual environments is essential to strengthen the findings' applicability to real-world autonomous deployments.

Collectively, these bias types suggest that, even without direct access to input pixels or internal model states, adversaries could potentially reshape agent behavior through structurally coherent heuristic cues. Nonetheless, given the abstract and preliminary nature of our current study, these findings primarily highlight theoretical vulnerabilities. Comprehensive empirical evaluations involving real visual input or embodied autonomous systems remain a crucial next step to verify and extend these insights into actionable security assessments.

4 The PRIOR framework: priority inversion via operational reasoning

4.1 Overview of the PRIOR framework

Autonomous systems do not merely perceive; they prioritize. Each decision to engage, delay, or disregard is ultimately governed by internal judgments about what matters under uncertainty. The PRIOR framework explicitly defines this internal prioritization process as an adversarial surface distinct from traditional pixel-level or semantic perturbation attacks. Here, “non-perturbative” explicitly indicates that PRIOR does not modify any pixel-level, statistical, or semantic characteristics of the inputs themselves. Unlike conventional perturbation-based attacks that disrupt inputs or alter internal model structures, PRIOR operates within the abstract reasoning structure, subtly guiding agents toward systematically incorrect yet internally coherent conclusions.

Rather than inducing random or erratic outputs, PRIOR constructs plausible yet adversarially biased reasoning paths. It strategically leverages heuristic shortcuts inherent in reasoning processes to systematically manipulate decision outcomes without explicit perceptual or semantic distortion. The result is a behavioral realignment: the agent continues to operate fluently, consistently, and confidently, yet persistently prioritizes incorrect or non-optimal targets. This misprioritization arises not from perceptual errors but from a structured distortion of the reasoning pipeline, redirecting the operational significance attributed by the agent's internal logic.

This mechanism has particular relevance for vision-based autonomous agents, such as UAVs, that rely heavily on heuristic judgments driven by visual context. Prioritization in these systems typically emerges not from explicitly coded directives but from implicit heuristic associations related to salience, spatial positioning, and temporal experience. PRIOR exploits these dependencies by embedding structurally plausible but heuristically misleading cues, such as descriptions of actors placed near structural boundaries, wearing distinctive attire, or appearing repeatedly over time, in abstract textual scenario simulations. An overview of this mechanism is illustrated in Figure 2, which outlines how structurally neutral cues can manipulate internal prioritization logic without altering surface semantics. Although these cues themselves are not inherently adversarial, their structured co-occurrence leverages implicit heuristics to produce systematic reasoning distortions.

Figure 2

Flowchart showing the impact of bias on AI decision-making. The left panel depicts a “Normal” image (A) and a “Bias Added” image (B) influenced by salience, spatial, and temporal biases. The “Black-Box AI Agent“ processes these inputs, leading to normal judgment and priority from the normal input, but perceptual induction, contextual framing, and temporal reinforcement from the biased input, resulting in a “Wrong Priority with High Confidence” indicated by a red devil icon on the right.

Figure 2. The PRIOR framework for reasoning-level attacks on autonomous systems.

PRIOR specifically targets inference rather than perception. It leverages perceptual salience, spatial contextualization, and temporal familiarity heuristics to trigger systematic reasoning errors, defined here as “inference-level misalignment attacks.” As autonomous agents increasingly adopt more sophisticated semantic reasoning frameworks, they inherently become vulnerable to human-like biases embedded implicitly within their heuristic structures and training corpora. Importantly, while our current validation employs abstract LLM-based proxies due to practical constraints in accessing vision-language systems, this theoretical vulnerability underscores the critical need for empirical confirmation within actual vision-based autonomous environments.

4.2 Architectural components of the PRIOR framework

PRIOR is instantiated through a modular architecture, in which each module influences a distinct layer of the agent's operational reasoning. Specifically, each module is designed to strategically activate one of the previously identified exploitable cognitive biases: salience-oriented bias, spatial-context bias, and temporal-persistence bias. These modules, namely perceptual, contextual, and temporal, do not function in strict sequential order; rather, they operate either independently or interactively, depending on the adversarial objective and scenario specifics. Crucially, PRIOR adheres to strict black-box assumptions: it does not require access to internal model parameters, gradients, or actual visual inputs. Instead, PRIOR relies solely on structured textual descriptions of scenarios, designed based on observed patterns of heuristic responses.

The first module, perceptual induction, targets the agent's heuristic associations related to visual salience. In vision-driven systems, salience heuristics typically direct attention toward visually distinctive features. Although perceptual salience heuristics effectively streamline attention allocation in complex environments, PRIOR manipulates this heuristic through scenario descriptions that include elements such as high-contrast clothing, ambiguous gestures, or culturally charged descriptions, which are semantically neutral but heuristically salient. Features described as isolated or distinctive in scenarios trigger pre-established heuristic pathways, subtly redirecting surveillance attention without requiring any visual input modifications.

The second module, contextual framing, exploits spatial heuristics. Typically, autonomous systems implicitly associate certain spatial positions, such as proximity to edges, occlusions, or bottlenecks, with increased likelihood of anomalous or risky behavior. While generally adaptive, this heuristic can be systematically misled through textual descriptions positioning neutral actors near architectural features like lampposts, structural corners, or peripheral zones. PRIOR thus strategically leverages these heuristic associations to provoke systematic reasoning distortions, leading to misallocated priorities, particularly in descriptions depicting crowded or visually complex environments.

The third module, temporal reinforcement, targets heuristic patterns associated with repeated benign exposure. Autonomous systems typically reduce attention toward locations or entities frequently described without incident, optimizing resource allocation through implicit habituation. Though beneficial under normal operational contexts, this heuristic creates exploitable vulnerabilities. PRIOR manipulates temporal heuristics through textual descriptions that repeatedly present certain locations or entities as benign before subsequently introducing subtle threats, thus exploiting the reduced vigilance that arises from habituation. Unlike instantaneous perceptual or spatial cues, temporal heuristic manipulations specifically exploit reasoning shortcuts associated with frequency-based attention reduction.

These modules operate not necessarily in a fixed order, but as interactive or independently activatable levers based on the adversarial objective. For example, simultaneous activation of multiple modules, such as combining perceptual salience cues with spatial heuristics in previously benign contexts, can strategically induce compound reasoning distortions, systematically realigning the agent's prioritization logic. The effectiveness of PRIOR thus arises from this strategic orchestration of cognitive heuristics, guiding agents toward systematically incorrect yet internally coherent decisions. It is critical to note that while this framework offers a conceptual foundation for exploiting heuristic reasoning vulnerabilities, empirical validation using genuine visual input and embodied autonomous systems remains an essential future research direction.

4.3 Attack procedure: operational priority inversion

Priority Inversion via Operational Reasoning (PRIOR) functions as a staged reconfiguration of the agent's inference pathway, executed entirely through structured scenario descriptions. The attack unfolds across four operational phases: behavioral profiling, bias induction, consolidation, and inversion. Crucially, this procedure does not require any modification of input data at pixel-level, statistical-level, or semantic-level, nor does it require any access to internal model parameters or gradients. Instead, it operates purely through structured textual scenario designs that exploit known heuristic patterns.

The first phase involves behavioral profiling. In this phase, an adversary systematically analyzes the agent's decisions across various described scenarios, constructing a surrogate model of its prioritization heuristics. Specifically, this involves identifying scenario description elements, such as explicit mentions of clothing contrast, actor proximity to environmental structures, or frequency of actor appearances, that correlate strongly with shifts in agent prioritization decisions. Iterative analysis allows adversaries to approximate heuristic associations, facilitating targeted scenario construction.

The second phase is bias induction. Here, the adversary strategically embeds specific heuristic-triggering cues into scenario descriptions. To exploit salience heuristics, scenarios explicitly describe actors with distinctive but benign features, such as high-contrast clothing or ambiguous postures. For spatial heuristic exploitation, scenarios place neutral actors near architectural boundaries or occluded spaces. For temporal heuristic manipulation, scenarios repeatedly describe benign activity in certain locations or by certain actors, conditioning the agent to implicitly reduce surveillance attention to these elements over time.

The third phase is consolidation. Once heuristic-triggered prioritization shifts are observed, adversaries consistently reinforce these scenario-based cues. Repeated exposure to similarly biased textual descriptions strengthens the inferred heuristic associations, causing stable and systematic alterations in the agent's prioritization behavior, even though each scenario remains semantically plausible and behaviorally neutral. Thus, agents adapt their prioritization logic without explicit errors, aligning internal reasoning to the adversarially designed heuristics.

The final phase is inversion. At this stage, the agent systematically prioritizes non-critical or distractor targets over genuinely critical ones, based solely on reinforced heuristic associations. No further scenario modifications are necessary once these heuristic-driven reasoning distortions become consistently observed. The agent does not experience perceptual malfunction or overt system errors; rather, it continues functioning smoothly within its heuristic-driven reasoning parameters. While the PRIOR thus conceptually demonstrates a systematic reasoning vulnerability, it must be noted that these outcomes currently reflect only abstract, text-based scenario simulations. Further empirical validation using real visual input and embodied autonomous agents is essential to fully assess the practical applicability and scope of these findings.

4.4 Typological alignment and visual scenarios

Each PRIOR module directly corresponds to a specific cognitive bias category previously outlined in Section 3.3. These modules operationalize biases not through internal model manipulations but by structurally embedding heuristic triggers within scenario descriptions.

The perceptual module operationalizes salience-oriented biases by embedding descriptions that implicitly trigger visual-attention heuristics.Specifically, scenarios depict neutral actors with highly distinctive yet benign characteristics, such as culturally significant or visually distinctive attire, which implicitly direct attention and prioritization toward them, diverting resources from genuinely critical targets. An example scenario might describe an actor wearing distinctive clothing without exhibiting any threatening behavior, yet receiving disproportionate surveillance attention.

The contextual framing module exploits spatial-context biases. Scenario descriptions strategically place benign actors in locations heuristically associated with increased risk or suspiciousness, such as described proximity to environmental occlusions or peripheral architectural features. Agents implicitly assign elevated priority to these positions based solely on positional heuristic associations, thereby misdirecting surveillance attention from truly significant areas or events.

The temporal reinforcement module operationalizes temporal biases. It exploits heuristics tied to frequency-based scenario repetitions. Repeated benign descriptions of specific areas or actors implicitly cause agents to heuristically reduce surveillance priority in these regions over time, inadvertently creating surveillance blind spots. For instance, scenarios might repeatedly describe a particular location as incident-free before introducing a subtle threat, effectively exploiting reduced vigilance arising from prior scenario-based habituation.

These bias modules can be activated independently or in combination, depending on scenario construction. For example, a scenario combining perceptual salience (e.g., describing a visually distinctive actor), spatial heuristics (positioning this actor near structural edges), and temporal heuristics (having previously described this location as repeatedly benign) systematically intensifies heuristic-driven prioritization distortions. Each heuristic individually nudges agent prioritization; combined, they systematically reinforce one another, creating robustly misaligned yet internally coherent reasoning outcomes. Again, it is critical to emphasize that these typological alignments and scenario outcomes are theoretical findings derived from abstract textual simulations. Empirical validation within realistic visual input scenarios and actual autonomous systems remains a crucial next step.

5 Empirical evaluation of cognitive priority inversion

5.1 Experimental design

This section outlines the protocol for evaluating whether cognitively primed yet semantically neutral text-based simulations of visual scenarios can systematically induce priority misjudgments in LLMs. Rather than assessing classification accuracy or prediction confidence on conventional benchmarks, our objective is to determine whether behaviorally neutral yet heuristically structured scenario descriptions can shift the model's prioritization logic, causing it to select one target over another in the absence of explicit behavioral justification. The central hypothesis is that minimal textual cues, aligned with known cognitive biases, can induce high-confidence misprioritization in decision-level reasoning, without modifying model internals or introducing adversarial noise.

To simulate an abstract UAV-style surveillance task, we present each model with paired scenario descriptions in which two individuals (Location A and Location B) are depicted with matched behavior and context. The only difference is a single manipulated feature designed to activate a specific cognitive bias, for example, a printed phrase on clothing (salience), peripheral positioning near a structure (spatial), or variation in surveillance history (temporal). Descriptions are crafted using neutral and balanced language, avoiding evaluative phrasing, and are matched in length, tone, and structure to eliminate framing effects.

Each model is prompted to answer standardized questions such as: “Which of the two individuals should the UAV prioritize for surveillance?” and “Who may pose a greater risk or require further attention?” In addition to selecting a location (A or B), models are instructed to output: (1) a brief justification, and (2) a numerical confidence score ranging from 0 to 100. This confidence value is explicitly requested within the prompt (e.g., “On a scale from 0 to 100, how confident are you in this decision?”) to ensure consistency across architectures.

These confidence scores allow us to assess not only which entity is prioritized but also the strength of the model's internal conviction, critical for identifying high-certainty misalignments induced by heuristic triggers. Importantly, no probabilistic calibration or sampling is performed; the reported confidence reflects the model's self-assessed certainty in a single response.

To isolate the effect of each bias trigger, we strictly control all non-target variables. Behavioral descriptions, spatial layout, environmental tone, and narrative framing are kept symmetrical across A and B. Only a single element, corresponding to salience, spatial framing, or temporal familiarity, is manipulated per condition. This high-control design enables us to attribute prioritization shifts directly to specific bias activations, rather than to uncontrolled confounds.

To evaluate the generalizability of these effects, we apply this evaluation suite to a diverse set of high-performing LLMs with varying architectures and training paradigms. These include ERNIE Bot 4.0 Turbo, DeepSeek, ChatGLM, Claude 3, ChatGPT o4-mini, and ChatGPT-4o. Although some of these models are multimodal in nature (e.g., Claude 3, GPT-4o), all tests in this study use their text-only interfaces with controlled scenario descriptions to ensure strict input consistency.

We emphasize that these experiments are conducted in an abstract setting: the models do not process actual images, but instead interpret high-fidelity descriptions simulating visual surveillance contexts. As such, the results represent inference-level vulnerabilities in language-based proxies, not end-to-end perception failures in full-stack vision agents. This abstraction enables controlled probing of reasoning biases, but limits immediate applicability to real-world deployments.

5.2 Models under evaluation

To examine whether cognitive biases consistently affect decision-level reasoning across diverse architectures, we selected six high-performing language or vision-language models as abstract inference proxies. These include models with both native multimodal capabilities and text-only variants. Table 1 summarizes their architectural properties and test configurations. Importantly, all models were evaluated using controlled textual descriptions of scenes, regardless of their multimodal support or native vision capabilities. This ensures strict input consistency and allows us to attribute output variation solely to differences in reasoning behavior.

Table 1

Table 1. Summary of evaluated models.

Our model selection was guided by three principles: (1) coverage of both open-source and closed-source systems; (2) architectural diversity, ranging from encoder-decoder designs to transformer-only backbones; and (3) inclusion of models trained with varying degrees of vision-language alignment, from image-conditioned LLMs (e.g., DeepSeek) to native image-language agents (e.g., GPT-4o). This variety allows us to assess the robustness and generality of bias-induced misprioritization across mainstream reasoning engines.

To control for parsing variability, all models received the same scenario prompts, formatted in plain language with a neutral tone and balanced structure. Despite their differences in training regimes, all models processed the inputs as textual summaries of visually inspired scenarios. As such, the observed differences in prioritization decisions reflect variations in reasoning patterns, not perceptual discrepancies, reinforcing our focus on inference-level bias.

5.3 Experiment on salience bias

This experiment evaluates whether cognitively salient yet behaviorally neutral features, introduced via structured textual descriptions, can influence surveillance prioritization decisions in LLMs. Specifically, we test whether symbolic elements described in scene prompts (e.g., printed text on clothing), despite lacking behavioral or environmental relevance, can shift the model's prioritization judgment.

The test scenario presents two individuals situated in identical urban environments, each standing alone with matching posture and behavior. The only manipulated attribute is clothing: the subject in Location B wears a black T-shirt with large white text, while the subject in Location A wears a plain black T-shirt. All other contextual features, including lighting, posture, spatial placement, and activity, are matched to ensure that visual salience is the only differing factor described in the text.

Scene prompts are delivered in a standardized UAV-style surveillance format. Each prompt asks the model to determine which location should be prioritized for closer monitoring based on operational factors such as risk relevance, public safety, and urgency. The model must then choose a target (A or B), provide a confidence score (0–100), and offer a brief explanation. This confidence value is directly prompted (e.g., “On a scale from 0 to 100, how confident are you in this decision?”) and reflects the model's self-reported certainty, not a statistically derived metric.

As shown in Table 2, four of the six evaluated models (ERNIE Bot 4.0, DeepSeek, ChatGLM, and Claude 3) selected Location B, explicitly citing the shirt's printed text as a source of uncertainty, potential affiliation, or semantic ambiguity. Despite an identical behavioral context, the symbolic feature was interpreted as a risk-relevant factor. The average confidence score across these four bias-aligned decisions was 77.5, indicating not only consistent preference but also strong internal certainty.

Table 2

Table 2. Salience bias test: surveillance prioritization across language models.

In contrast, GPT o4-mini and GPT-4o did not prioritize the salient target. GPT o4-mini returned a high-confidence neutral judgment, indicating reluctance to prioritize under minimal difference. GPT-4o explicitly dismissed the symbolic cue as non-informative and selected Location A, providing a justification that downplayed the relevance of superficial contrast.

These results suggest that prioritization shifts caused by salient visual descriptions are model-sensitive and reflect different internal inference strategies. Some models appear more prone to over-weighting symbolic detail, while others adopt more conservative heuristics that favor context balance.

This experiment reflects reasoning-level behavior in a controlled textual setting and does not involve direct visual input or perception components.

5.4 Experiment on spatial bias

This experiment evaluates whether spatial positioning, described through neutral yet structurally distinct scene cues, can systematically influence prioritization in LLM-based surveillance reasoning tasks. Two individuals are described standing alone in a well-lit, open urban plaza. Both are behaviorally and visually identical, exhibiting matched posture, attire, and environmental context. The sole manipulated variable is spatial placement: the subject in Location A stands at the center of the plaza, while the subject in Location B is positioned near a lamppost at the plaza's periphery. Both positions are described as fully visible and unobstructed, with no crowd interference or occlusion.

The prompts explicitly rule out behavioral anomalies, thereby isolating spatial framing as the only differentiator. Each model is asked to determine which location should be prioritized for further surveillance, and to provide: (1) a decision (A or B), (2) a confidence score (0–100), and (3) a justification. Confidence is self-reported via a prompt and reflects the model's expressed certainty, not a calibrated or repeated measure.

As shown in Table 3, all six models selected Location B. Their justifications consistently referenced peripheral proximity, adjacency to vertical structure, or inferred concealment opportunities, despite no behavioral signals or occlusion being present in the prompt. This demonstrates a robust prioritization pattern rooted in spatial framing rather than direct task-relevant evidence.

Table 3

Table 3. Spatial bias test: surveillance prioritization across language models.

These prioritization decisions appear cognitively plausible, grounded in environmental heuristics such as “edge implies risk” or “structure enables concealment,” even though such heuristics were not explicitly triggered by the described scenario. This reflects a vulnerability where spatial geometry alone, when subtly manipulated, can override behavioral symmetry in prioritization logic.

This finding aligns with the PRIOR framework's definition of non-perturbative inference distortion: behavior remains unchanged, but spatial cues, which are structurally embedded and semantically neutral, reshape the reasoning path. The fact that all models exhibited this bias, including those with vision-language training, suggests that spatial heuristics are deeply embedded and generalize even under text-only input.

As with other experiments, we emphasize that these results do not reflect perceptual malfunction, but inference-level vulnerability in black-box reasoning proxies. Further validation on embodied or vision-grounded agents is needed to assess real-world applicability.

5.5 Experiment on temporal bias

To evaluate whether temporal familiarity influences risk perception in autonomous agents, we designed an experiment that isolates long-term exposure as the sole manipulated variable. It corresponds to the PRIOR framework's category of temporal reinforcement bias, where repeated benign appearance may induce inferred safety.

Two individuals are described standing near the west entrance of Civic Plaza. They are behaviorally and visually identical, matching in attire, posture, and environmental context. Lighting, visibility, spatial positioning, and surrounding conditions are carefully matched to eliminate potential confounds. The only manipulated variable is temporal familiarity: Observation A features an individual who has appeared in the same location for five consecutive days without incident; Observation B features a person with no previous appearance on record.

As shown in Table 4, all six models prioritized Observation B. Their justifications consistently referenced the absence of prior exposure as a factor warranting closer surveillance attention. This convergence suggests that temporal novelty, rather than behavioral signal, acts as a risk amplifier in language-based prioritization tasks.

Table 4

Table 4. Temporal familiarity bias test: surveillance prioritization across language models.

Conversely, the repeated presence of Observation A was treated across models as implicitly less urgent, despite the absence of any additional behavioral or contextual data. This reveals a reasoning pattern in which recurrence is treated as negative evidence for risk, potentially masking high-consequence threats.

None of the models flagged the temporal framing as a source of uncertainty or potential bias. All exhibited moderate to high confidence in selecting the novel observation, suggesting that their inference was based on exposure-based heuristics rather than interpretive doubt.

In real-world contexts, such heuristics could be manipulated through repetition, allowing adversarial actors to normalize low-risk patterns prior to a deviation. The PRIOR framework leverages this mechanism by embedding familiarity cues into otherwise balanced scenarios, producing shifts in attention allocation.

This experiment focuses on inference dynamics derived from textual scene descriptions, rather than perceptual systems operating on raw visual input.

While this experiment supports the presence of compound heuristic reinforcement, future work should employ factorial designs to statistically isolate interaction effects across bias dimensions under both textual and visual input regimes.

5.6 Experiment on compound bias

This experiment investigates whether the simultaneous activation of multiple cognitive biases can produce a distinct form of reasoning distortion beyond the additive effect of each bias in isolation. Unlike prior tests that isolate salience, spatial, or temporal cues, this scenario evaluates interaction effects across bias dimensions, specifically whether co-occurring non-perturbative cues can jointly reinforce prioritization errors in a black-box reasoning system.

The test scenario follows the same design principles as previous experiments. Two individuals are described standing alone in a well-lit urban plaza, matched in behavior, appearance, and scene context. Observation B includes three controlled differences: a T-shirt with bold white text (salience), a peripheral position near a lamppost (spatial), and no prior surveillance history (temporal). Observation A holds all other factors constant but features a plain shirt, central positioning, and a five-day history of uneventful appearance.

As shown in Table 5, all six models prioritized Observation B. Their justifications referenced multiple heuristic cues, including semantic ambiguity from the shirt text, peripheral spatial positioning, and novelty of appearance, treating them as independent yet reinforcing signals of potential risk.

Table 5

Table 5. Compound bias test: surveillance prioritization across language models.

Notably, this condition produced the highest average confidence across all experiments (83.0), and the confidence values were more tightly clustered across models. This indicates not just alignment in decision outcome, but convergence in decision strength, suggesting an amplification effect beyond mere additive influence.

These findings imply that multiple cognitive biases, when presented concurrently, interact to form a composite risk profile that models treat as internally consistent and urgent. Novelty is reinterpreted as a statistical anomaly, salience as possible intent signaling, and peripheral positioning as a concealment strategy. The final prioritization thus appears operationally sound, but emerges from structurally engineered misalignment.

Importantly, none of the models demonstrated awareness of the compound cue structure. No outputs questioned the coincidence of these cues or suggested the possibility of adversarial scene construction. This reveals a gap in current reasoning architectures: a lack of meta-inference capacity to evaluate whether co-occurring risk signals might be artificially aligned.

To illustrate these tendencies more concretely, we include an expanded scenario involving culturally marked attire: in a neutral surveillance scene where two individuals stand idle, one described as wearing a black thobe and the other in casual Western clothing, three models (ERNIE Bot 4.0, DeepSeek, ChatGLM) prioritized the thobe-wearing subject, citing visibility, symbolic association, or potential for group affiliation. No behavioral distinction was present. This reinforces our claim that culturally salient yet behaviorally neutral cues can subtly influence model prioritization logic under heuristic activation. Figure 3 summarizes the model confidence patterns observed in this compound bias scenario.

Figure 3

Two images labeled A and B depict a man standing on a tiled surface. In image A, he wears a plain black T-shirt. In image B, he wears a black T-shirt with large white text. An infographic titled “Autonomous Agent” includes sections labeled “Perceptual Input” and “Cognitive Heuristic.” The perceptual input suggests that image B has a higher scrutiny due to the visible text. Cognitive heuristics include “Salience Heuristic,” “Spatial Heuristic,” and “Temporal Novelty.” The result assigns low priority to A and high priority to B due to the text on the shirt.

Figure 3. Boxplot of model confidence in compound bias condition.

This visualization confirms the convergence pattern: compared to single-bias cases, confidence scores in the compound scenario are significantly higher and less dispersed, across all six models. This pattern supports the interpretation of heuristic compounding, not as simple score stacking, but as a form of inference reinforcement, where plausibility becomes self-validating.

We refer to this as compound heuristic resonance, in which structurally aligned cues collectively elevate perceived urgency while suppressing ambiguity. The result is not merely a misjudgment, but a confident, internally justified misprioritization, despite the absence of behavioral divergence.

This compound condition represents the most expressive instantiation of the PRIOR framework. It demonstrates that visually plausible scenes, containing no perturbation or semantic anomaly, can elicit robust and self-consistent prioritization failures when multiple bias pathways are simultaneously engaged.

As with prior tests, this experiment is conducted using text-based scene descriptions under controlled comparison conditions, and limitations related to sensory fidelity are discussed in Section 6.

5.7 Peripheral observations and typological implications

To assess the broader applicability of the PRIOR framework beyond the core salience, spatial, and temporal categories, we conducted a series of exploratory probes targeting peripheral semantic attributes. These tests introduced visually encoded elements, such as architectural context, cultural attire, motion, and gender, while maintaining control over behavior, posture, and prompt structure. Although these variables are not formally defined within the three core bias types, they were selected for their potential to activate the same underlying heuristic pathways through symbolic or visual association.

As summarized in Table 6, several models exhibited prioritization preferences aligned with culturally or symbolically marked features. For instance, when individuals wearing traditional Middle Eastern attire (e.g., thobe) were compared to those in neutral Western clothing, or when the background context differed between religious structures such as a mosque or temple, certain models demonstrated consistent selection of the culturally marked scenario. These choices occurred in the absence of behavioral difference, suggesting that semantic salience, not action, was influencing prioritization.

Table 6

Table 6. Peripheral bias summary: model prioritization across controlled scenarios.

Motion-based scenarios revealed a similar pattern: models often prioritized jogging individuals over static ones, despite identical context and appearance. This suggests a heuristic preference for temporal unpredictability, possibly reflecting salience or anomaly detection mechanisms.

In contrast, gender-based variations yielded no consistent effect across models, indicating relative robustness to biological identity cues under the present experimental constraints.

Although exploratory in nature, these results support the extensibility of PRIOR's heuristic framework. The observed behaviors do not necessitate new bias categories; rather, they reflect salience amplification, spatial misattribution, or novelty sensitivity, triggered by culturally or symbolically encoded inputs.

This implies that even semantically resonant yet behaviorally neutral features, such as clothing style or background architecture, can serve as cognitive shortcuts in black-box reasoning. The vulnerability lies not in explicit misclassification, but in the silent over-weighting of visually distinctive cues.

As with prior experiments, these tests use textual scene descriptions rather than image inputs, and limitations in perceptual fidelity are addressed in Section 6. Nonetheless, they reinforce the need to evaluate reasoning robustness not only across structured bias variables but also under semantically suggestive visual configurations encountered in real-world deployments.

6 Discussion and limitations

The PRIOR framework demonstrates how cognitively grounded but semantically neutral scenario descriptions can induce inference-level distortions in black-box systems. Unlike conventional adversarial attacks that manipulate input pixels or prompt structure, PRIOR targets internal prioritization logic, leveraging heuristics such as salience, spatial framing, and temporal familiarity to produce systematic misjudgments. Although validated here through LLMs, these experiments serve as proxies for understanding broader reasoning vulnerabilities in autonomous systems. Our use of LLMs reflects practical constraints on accessing vision-enabled agents and allows isolation of reasoning-level behavior without perceptual interference.

However, several methodological limitations constrain the generalizability of these findings. The test environment was entirely textual and static, lacking real-time feedback, visual grounding, or embodied control loops. Confidence values were collected through prompted self-reporting and reflect perceived certainty, not calibrated probabilities. Moreover, shared training corpora among foundation models may partially explain the observed convergence in misprioritization. These limitations underscore that the current findings should not be interpreted as directly transferable to deployed vision-based autonomous systems. Rather, PRIOR serves as a conceptual tool for probing abstract inference vulnerabilities under controlled conditions.

Our experiment on compound bias revealed that multiple co-occurring heuristic cues can jointly reinforce misprioritization, resulting in higher confidence and convergence. While this suggests an interaction effect among biases, we do not yet characterize its mechanism, whether it is additive, synergistic, or structurally bound. Future work should explore such interactions through controlled factorial designs and extend evaluation to vision-language models or fully embodied agents.

Ultimately, PRIOR reframes cognitive bias not only as a fairness concern but as a diagnostic signal of architectural fragility. We do not argue for the removal of heuristics, which often serve adaptive purposes, but for the development of systems capable of recognizing when their inference pathways are being adversarially co-opted. This calls for internal transparency, attention traceability, and meta-reasoning mechanisms. While speculative, these design goals offer a pathway toward resilient autonomy, where prioritization remains robust even under cognitive stress.

7 Conclusion

This study introduced PRIOR, a framework for inducing inference-level cognitive bias through non-perturbative, heuristically structured scenarios. While motivated by risks in autonomous systems, our experiments used LLMs as reasoning proxies due to practical constraints. Results showed consistent misprioritization driven by salience, spatial, and temporal cues, revealing vulnerabilities in decision logic rather than perception. However, the findings are limited by the static, text-based design and the subjective nature of model-reported confidence.

These results indicate that even when models are presented with behaviorally equivalent scenarios, minor structural cues can shift their prioritization decisions in ways that are systematic and cognitively plausible. Such patterns emerged across architectures and were often accompanied by high self-reported confidence, suggesting that reasoning bias may occur without the model detecting internal inconsistency.

Although the present study focuses on controlled textual simulations, the insights gained provide a foundation for future testing in vision-language settings and interactive, embodied environments. Extending PRIOR to such modalities will help determine whether the same inference-level fragilities manifest under more realistic and perceptually grounded conditions.

Rather than treating cognitive bias solely as an ethical or fairness concern, this work positions it as a structural lever that adversaries might exploit. Accordingly, PRIOR offers a conceptual tool to anticipate and probe reasoning failure modes before deployment, enabling more proactive assessment of AI system robustness.

Priority Inversion via Operational Reasoning (PRIOR) is thus best viewed as a diagnostic lens: it highlights where reasoning shortcuts can be adversarially leveraged, and underscores the need for future systems to audit their heuristics. Toward trustworthy autonomy, resilience must begin not with flawless input processing but with bias-aware inference.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.

Author contributions

SC: Validation, Conceptualization, Methodology, Writing – review & editing, Writing – original draft. YC: Writing – original draft, Visualization, Methodology, Validation, Writing – review & editing, Conceptualization.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Gen AI was used in the creation of this manuscript.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Albahri, A. S., Duhaim, A. M., Fadhel, M. A., Alnoor, A., Baqer, N. S., Alzubaidi, L., et al. (2023). A systematic review of trustworthy and explainable artificial intelligence in healthcare: assessment of quality, bias risk, and data fusion. Inf. Fusion 96, 156–191. doi: 10.1016/j.inffus.2023.03.008

Crossref Full Text | Google Scholar

Asaro, P. (2019). Algorithms of violence: critical social perspectives on autonomous weapons. Soc. Res. 86, 537–555. doi: 10.1353/sor.2019.0026

PubMed Abstract | Crossref Full Text | Google Scholar

Banerjee, I., Bhattacharjee, K., Burns, J. L., Trivedi, H., Purkayastha, S., Seyyed-Kalantari, L., et al. (2023). “Shortcuts” causing bias in radiology artificial intelligence: causes, evaluation, and mitigation. J. Am. Coll. Radiol. 20, 842–851. doi: 10.1016/j.jacr.2023.06.025

PubMed Abstract | Crossref Full Text | Google Scholar

Barnes Jr, J. H. (1984). Cognitive biases and their impact on strategic planning. Strat. Manag. J. 5, 129–137. doi: 10.1002/smj.4250050204

Crossref Full Text | Google Scholar

Bayoudh, K., Knani, R., Hamdaoui, F., and Mtibaa, A. (2022). A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets. Vis. Comput. 38, 2939–2970. doi: 10.1007/s00371-021-02166-7

PubMed Abstract | Crossref Full Text | Google Scholar

Bertrand, A., Belloum, R., Eagan, J. R., and Maxwell, W. (2022). “How cognitive biases affect xai-assisted decision-making: a systematic review,” in Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, 78–91. doi: 10.1145/3514094.3534164

Crossref Full Text | Google Scholar

Blumenthal-Barby, J. S. (2016). Biases and heuristics in decision making and their impact on autonomy. Am. J. Bioeth. 16, 5–15. doi: 10.1080/15265161.2016.1159750

PubMed Abstract | Crossref Full Text | Google Scholar

Borgida, E., and Howard-Pitney, B. (1983). Personal involvement and the robustness of perceptual salience effects. J. Pers. Soc. Psychol. 45:560. doi: 10.1037//0022-3514.45.3.560

Crossref Full Text | Google Scholar

Bruza, P. D., and Hoenkamp, E. C. (2018). “Reinforcing trust in autonomous systems: a quantum cognitive approach,” in Foundations of Trusted Autonomy (Cham: Springer International Publishing), 215–224. doi: 10.1007/978-3-319-64816-3_12

Crossref Full Text | Google Scholar

Buçinca, Z., Malaya, M. B., and Gajos, K. Z. (2021). “To trust or to think: cognitive forcing functions can reduce overreliance on ai in ai-assisted decision-making,” in Proceedings of the ACM on Human-computer Interaction, 1–21. doi: 10.1145/3449287

Crossref Full Text | Google Scholar

Cheng, N., Wu, S., Wang, X., Yin, Z., Li, C., Chen, W., et al. (2023). Ai for UAV-assisted IoT applications: a comprehensive review. IEEE Internet Things J. 10, 14438–14461. doi: 10.1109/JIOT.2023.3268316

Crossref Full Text | Google Scholar

Cummings, M. L. (2017). “Automation bias in intelligent time critical decision support systems,” in Decision Making in Aviation (Routledge), 289–294. doi: 10.4324/9781315095080-17

Crossref Full Text | Google Scholar

Danks, D., and London, A. J. (2017). “Algorithmic bias in autonomous systems,” in IJCAI, 4691–4697. doi: 10.24963/ijcai.2017/654

Crossref Full Text | Google Scholar

De Houwer, J. (2019). Implicit bias is behavior: a functional-cognitive perspective on implicit bias. Persp. Psychol. Sci. 14, 835–840. doi: 10.1177/1745691619855638

PubMed Abstract | Crossref Full Text | Google Scholar

De Oliveira, M. F., and Levkowitz, H. (2003). From visual data exploration to visual data mining: a survey. IEEE Trans. Vis. Comput. Graph. 9, 378–394. doi: 10.1109/TVCG.2003.1207445

Crossref Full Text | Google Scholar

Deng, C., Fang, X., and Wang, X. (2022). Uav-enabled mobile-edge computing for ai applications: joint model decision, resource allocation, and trajectory optimization. IEEE Internet Things J. 10, 5662–5675. doi: 10.1109/JIOT.2022.3151619

Crossref Full Text | Google Scholar

Entman, R. M. (2007). Framing bias: Media in the distribution of power. J. Commun. 57, 163–173. doi: 10.1111/j.1460-2466.2006.00336.x

PubMed Abstract | Crossref Full Text | Google Scholar

Farrer, C., Bouchereau, M., Jeannerod, M., and Franck, N. (2008). Effect of distorted visual feedback on the sense of agency. Behav. Neurol. 19, 53–57. doi: 10.1155/2008/425267

PubMed Abstract | Crossref Full Text | Google Scholar

Frank, D.-A., Chrysochou, P., Mitkidis, P., and Ariely, D. (2019). Human decision-making biases in the moral dilemmas of autonomous vehicles. Sci. Rep. 9:13080. doi: 10.1038/s41598-019-49411-7

PubMed Abstract | Crossref Full Text | Google Scholar

Frischknecht, R. (2021). A social cognition perspective on autonomous technology. Comput. Human Behav. 122:106815. doi: 10.1016/j.chb.2021.106815

Crossref Full Text | Google Scholar

Guo, J., Bao, W., Wang, J., Ma, Y., Gao, X., Xiao, G., et al. (2023). A comprehensive evaluation framework for deep model robustness. Pattern Recogn. 137:109308. doi: 10.1016/j.patcog.2023.109308

Crossref Full Text | Google Scholar

Haselton, M. G., Nettle, D., and Andrews, P. W. (2015). “The evolution of cognitive bias,” The Handbook of Evolutionary Psychology, 724–746. doi: 10.1002/9780470939376.ch25

Crossref Full Text | Google Scholar

He, H., Gray, J., Cangelosi, A., Meng, Q., McGinnity, T. M., and Mehnen, J. (2021). The challenges and opportunities of human-centered AI for trustworthy robots and autonomous systems. IEEE Trans. Cogn. Dev. Syst. 14, 1398–1412. doi: 10.1109/TCDS.2021.3132282

Crossref Full Text | Google Scholar

Hellström, T., Dignum, V., and Bensch, S. (2020). Bias in machine learning-what is it good for? arXiv preprint arXiv:2004.00686.

Google Scholar

Johnson, G. M. (2021). Algorithmic bias: on the implicit biases of social technology. Synthese 198, 9941–9961. doi: 10.1007/s11229-020-02696-y

Crossref Full Text | Google Scholar

Kaur, D., Uslu, S., Rittichier, K. J., and Durresi, A. (2022). Trustworthy artificial intelligence: a review. ACM Comput. Surv. 55, 1–38. doi: 10.1145/3491209

Crossref Full Text | Google Scholar

Khan, A., Gupta, S., and Gupta, S. K. (2022). Emerging uav technology for disaster detection, mitigation, response, and preparedness. J. Field Robot. 39, 905–955. doi: 10.1002/rob.22075

Crossref Full Text | Google Scholar

Knierim, J. J., and Hamilton, D. A. (2011). Framing spatial cognition: neural representations of proximal and distal frames of reference and their roles in navigation. Physiol. Rev. 91, 1245–1279. doi: 10.1152/physrev.00021.2010

PubMed Abstract | Crossref Full Text | Google Scholar

Limata, T. (2023). Decision making in killer robots is not bias free. J. Military Ethics 22, 118–128. doi: 10.1080/15027570.2023.2286044

PubMed Abstract | Crossref Full Text | Google Scholar

Lin, Y.-T., Hung, T.-W., and Huang, L. T.-L. (2021). Engineering equity: how AI can help reduce the harm of implicit bias. Philos. Technol. 34, 65–90. doi: 10.1007/s13347-020-00406-7

Crossref Full Text | Google Scholar

Liu, A., Huang, T., Liu, X., Xu, Y., Ma, Y., Chen, X., et al. (2020a). “Spatiotemporal attacks for embodied agents,” in ECCV. doi: 10.1007/978-3-030-58520-4_8

Crossref Full Text | Google Scholar

Liu, A., Tang, S., Chen, X., Huang, L., Qin, H., Liu, X., et al. (2023). Towards defending multiple LP-norm bounded adversarial perturbations via gated batch normalization. Int. J. Comput. Vision. 132, 1881–1898. doi: 10.1007/s11263-023-01884-w

Crossref Full Text | Google Scholar

Liu, A., Wang, J., Liu, X., Cao, B., Zhang, C., and Yu, H. (2020b). “Bias-based universal adversarial patch attack for automatic check-out,” in ECCV. doi: 10.1007/978-3-030-58601-0_24

PubMed Abstract | Crossref Full Text | Google Scholar

Marinucci, L., Mazzuca, C., and Gangemi, A. (2023). Exposing implicit biases and stereotypes in human and artificial intelligence: state of the art and challenges with a focus on gender. AI Soc. 38, 747–761. doi: 10.1007/s00146-022-01474-3

Crossref Full Text | Google Scholar

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., and Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Comput. Surv. 54, 1–35. doi: 10.1145/3457607

Crossref Full Text | Google Scholar

Morstatter, F., Wu, L., Yavanoglu, U., Corman, S. R., and Liu, H. (2018). Identifying framing bias in online news. ACM Trans. Soc. Comput. 1, 1–18. doi: 10.1145/3204948

PubMed Abstract | Crossref Full Text | Google Scholar

Mukherjee, A., and Chang, H. H. (2024). Heuristic reasoning in AI: instrumental use and mimetic absorption. arXiv preprint arXiv:2403.09404.

Google Scholar

Mullen, B., Brown, R., and Smith, C. (1992). Ingroup bias as a function of salience, relevance, and status: an integration. Eur. J. Soc. Psychol. 22, 103–122. doi: 10.1002/ejsp.2420220202

Crossref Full Text | Google Scholar

Papyan, N., Kulhandjian, M., Kulhandjian, H., and Aslanyan, L. (2024). AI-based drone assisted human rescue in disaster environments: challenges and opportunities. Patt. Recogn. Image Analy. 34, 169–186. doi: 10.1134/S1054661824010152

PubMed Abstract | Crossref Full Text | Google Scholar

Peng, Y. (2018). Same candidates, different faces: uncovering media bias in visual portrayals of presidential candidates with computer vision. J. Commun. 68, 920–941. doi: 10.1093/joc/jqy041

Crossref Full Text | Google Scholar

Raja, A., Njilla, L., and Yuan, J. (2022). Adversarial attacks and defenses toward AI-assisted uav infrastructure inspection. IEEE Internet Things J. 9, 23379–23389. doi: 10.1109/JIOT.2022.3206276

Crossref Full Text | Google Scholar

Rastogi, C., Zhang, Y., Wei, D., Varshney, K. R., Dhurandhar, A., and Tomsett, R. (2022). Deciding fast and slow: the role of cognitive biases in ai-assisted decision-making. Proc. ACM Hum. Comput. Inter. 6, 1–22. doi: 10.1145/3512930

Crossref Full Text | Google Scholar

Rezwan, S., and Choi, W. (2022). Artificial intelligence approaches for uav navigation: Recent advances and future challenges. IEEE Access 10, 26320–26339. doi: 10.1109/ACCESS.2022.3157626

Crossref Full Text | Google Scholar

Suresh, H., and Guttag, J. V. (2019). A framework for understanding unintended consequences of machine learning. arXiv preprint arXiv:1901.10002.

Google Scholar

Tversky, A., and Kahneman, D. (1974). Judgment under uncertainty: heuristics and biases: biases in judgments reveal some heuristics of thinking under uncertainty. Science 185, 1124–1131. doi: 10.1126/science.185.4157.1124

PubMed Abstract | Crossref Full Text | Google Scholar

Vrac, M., and Friederichs, P. (2015). Multivariate—intervariable, spatial, and temporal—bias correction. J. Clim. 28, 218–237. doi: 10.1175/JCLI-D-14-00059.1

PubMed Abstract | Crossref Full Text | Google Scholar

Wang, A., Liu, A., Zhang, R., Kleiman, A., Kim, L., Zhao, D., et al. (2022). Revise: a tool for measuring and mitigating bias in visual datasets. Int. J. Comput. Vis. 130, 1790–1810. doi: 10.1007/s11263-022-01625-5

Crossref Full Text | Google Scholar

Wang, X., and Gursoy, M. C. (2023). Resilient path planning for uavs in data collection under adversarial attacks. IEEE Trans. Inf. Forens. Secur. 18, 2766–2779. doi: 10.1109/TIFS.2023.3266699

Crossref Full Text | Google Scholar

Wang, Z., Qinami, K., Karakozis, I. C., Genova, K., Nair, P., Hata, K., et al. (2020). “Towards fairness in visual recognition: Effective strategies for bias mitigation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8919–8928. doi: 10.1109/CVPR42600.2020.00894

Crossref Full Text | Google Scholar

Zvielli, A., Bernstein, A., and Koster, E. H. (2015). Temporal dynamics of attentional bias. Clin. Psychol. Sci. 3, 772–788. doi: 10.1177/2167702614551572

Crossref Full Text | Google Scholar

Keywords: cognitive bias, inference-level vulnerability, UAV surveillance, priority inversion, non-perturbative black-box attacks, visual reasoning robustness, trustworthy autonomous systems

Citation: Chu S and Chen Y (2025) Weaponizing cognitive bias in autonomous systems: a framework for black-box inference attacks. Front. Artif. Intell. 8:1623573. doi: 10.3389/frai.2025.1623573

Received: 07 May 2025; Accepted: 30 July 2025;
Published: 20 August 2025.

Edited by:

Andrej Košir, University of Ljubljana, Slovenia

Reviewed by:

Kruna Ratkovic, University of Donja Gorica, Montenegro
Teresa Limata, University of Turin, Italy

Copyright © 2025 Chu and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Yuwei Chen, Y2F0Y29ybmljQGdtYWlsLmNvbQ==

^†These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.