Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Logic and Reasoning in AI

Volume 8 - 2025 | doi: 10.3389/frai.2025.1677528

This article is part of the Research TopicConvergence of Artificial Intelligence and Cognitive SystemsView all articles

Epistemic Limits of Local Interpretability in Self-Modulating Cognitive Architectures

Provisionally accepted
  • Université Frères Mentouri Constantine 1, Constantine, Algeria

The final, formatted version of the article will be published soon.

Local interpretability techniques—such as LIME (Ribeiro et al., 2016) and SHAP (Lundberg & Lee, 2017) —have become standard tools for probing the decision-making logic of complex machine learning models. These methods rely on the assumption of local continuity in the latent space: that small perturbations around an input yield semantically consistent and explainable model responses. However, this assumption often breaks down in recursive, self-modulating cognitive architectures, where internal states are dynamically restructured through feedback loops, cross-layer attention, and latent program rewriting. Under such conditions, local explanations may become unstable or misleading. In this paper, we present evidence—through formal analysis, simulation experiments, and epistemological reflection—that local proxy models are insufficient to capture the internal narrative dynamics of self-reflective systems. We propose a shift from post-hoc local approximations to causal-hierarchical traceability, integrating internal self-monitoring signals with meta-generative narratives. Our framework builds on recent developments in modular neuro-symbolic agents, structured world models (Ha & Schmidhuber, 2018 ; Guez et al., 2021), reflective prompting (Kojima et al., 2023), and interpretability research in large-scale language models (Ji et al., 2023 ; Chan et al., 2022 ; Huang et al., 2024). It argues for a holistic interpretability paradigm more suited to future architectures. Rather than claiming a final solution, we advance a new epistemological framing for Artificial General Intelligence (AGI : see Glossary in Section 2.6 for definition) : one that treats intelligent systems as narratively structured, self-explaining epistemic agents. This perspective is exploratory, but it highlights the need for interpretability frameworks that evolve with system complexity and narrative modulation.

Keywords: Stratified Decision Landscapes, Salience-Gated Attention, Cognitive Leap Operator, Internal Narrative Generator, Modular Cognitive Attention, Recursive Contextual Memory, Meta-Computational Narratives, Narrative Interpretability

Received: 01 Aug 2025; Accepted: 08 Oct 2025.

Copyright: © 2025 MAHROUK. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Abdelaali MAHROUK, abd.marok25@gmail.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.