ORIGINAL RESEARCH article
Front. Artif. Intell.
Sec. Logic and Reasoning in AI
Volume 8 - 2025 | doi: 10.3389/frai.2025.1677528
This article is part of the Research TopicConvergence of Artificial Intelligence and Cognitive SystemsView all articles
Epistemic Limits of Local Interpretability in Self-Modulating Cognitive Architectures
Provisionally accepted- Université Frères Mentouri Constantine 1, Constantine, Algeria
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Local interpretability techniques—such as LIME (Ribeiro et al., 2016) and SHAP (Lundberg & Lee, 2017) —have become standard tools for probing the decision-making logic of complex machine learning models. These methods rely on the assumption of local continuity in the latent space: that small perturbations around an input yield semantically consistent and explainable model responses. However, this assumption often breaks down in recursive, self-modulating cognitive architectures, where internal states are dynamically restructured through feedback loops, cross-layer attention, and latent program rewriting. Under such conditions, local explanations may become unstable or misleading. In this paper, we present evidence—through formal analysis, simulation experiments, and epistemological reflection—that local proxy models are insufficient to capture the internal narrative dynamics of self-reflective systems. We propose a shift from post-hoc local approximations to causal-hierarchical traceability, integrating internal self-monitoring signals with meta-generative narratives. Our framework builds on recent developments in modular neuro-symbolic agents, structured world models (Ha & Schmidhuber, 2018 ; Guez et al., 2021), reflective prompting (Kojima et al., 2023), and interpretability research in large-scale language models (Ji et al., 2023 ; Chan et al., 2022 ; Huang et al., 2024). It argues for a holistic interpretability paradigm more suited to future architectures. Rather than claiming a final solution, we advance a new epistemological framing for Artificial General Intelligence (AGI : see Glossary in Section 2.6 for definition) : one that treats intelligent systems as narratively structured, self-explaining epistemic agents. This perspective is exploratory, but it highlights the need for interpretability frameworks that evolve with system complexity and narrative modulation.
Keywords: Stratified Decision Landscapes, Salience-Gated Attention, Cognitive Leap Operator, Internal Narrative Generator, Modular Cognitive Attention, Recursive Contextual Memory, Meta-Computational Narratives, Narrative Interpretability
Received: 01 Aug 2025; Accepted: 08 Oct 2025.
Copyright: © 2025 MAHROUK. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Abdelaali MAHROUK, abd.marok25@gmail.com
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.