ORIGINAL RESEARCH article
Front. Neurorobot.
This article is part of the Research TopicNeuromorphic Engineering and Brain-Inspired Control for Autonomous Robotics: Bridging Neuroscience and AI for Real-World ApplicationsView all 5 articles
SpikeAEC: A Neuromodulation-based Spiking Controller for Explore-Exploit Balancing in Mobile Robots
Provisionally accepted- Nanjing University of Information Science and Technology, Nanjing, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Balancing exploration and exploitation remains a fundamental challenge in reliable mobile robot control, as conventional policies often converge on suboptimal behaviors. Inspired by the brain's division of labor for adaptive control, we propose SpikeAEC, a fully spiking, neuromodulated Actor-Explorer-Critic architecture designed to address this dilemma online within a closed-loop system. SpikeAEC comprises three specialized subnetworks operating in parallel: the Actor, inspired by the basal ganglia, proposes exploitative actions; the Explorer, modeled after the ACC-GPe-STN pathway, generates adaptive exploratory actions gated by a vigilance signal modulated by the accumulated global temporal-difference (TD) error; and the Critic, based on the ventral striatum, computes the TD error. The final action is selected by a separate, TAN-based Arbitrator, which probabilistically chooses between the Actor's and Explorer's action proposals according to recent performance and the TD error. These subnetworks are coupled through a unified three-factor learning framework that uses the TD signal and phasic neuromodulators (acetylcholine and dopamine) from the Arbitrator to drive pathway-specific synaptic plasticity. This online plasticity enhances the quality of action proposals and accelerates policy refinement. In simulation, SpikeAEC outperforms leading brain-inspired methods by converging 24% faster, reducing trajectory length by 18%, and increasing cumulative reward by over 5% against the top-performing baseline, all while maintaining consistency with established neurophysiological principles.
Keywords: Actor-Explorer-Critic, Exploration-exploitation dilemma, Neuromodulation, Spiking neural networks (SNNs), three-factor learning rules
Received: 01 Dec 2025; Accepted: 14 Feb 2026.
Copyright: © 2026 Liu, Liu, Zhou and Su. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Canyang Liu
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
