A neural network model combining the successor representation and actor-critic methods reveals effective biological use of the representation

Tsurumi, Takayuki; Morita, Kenji

doi:10.3389/fncom.2025.1647462

ORIGINAL RESEARCH article

Front. Comput. Neurosci.

This article is part of the Research TopicBridging Computation, Biophysics, Medicine, and Engineering in Neural CircuitsView all 16 articles

A neural network model combining the successor representation and actor-critic methods reveals effective biological use of the representation

Provisionally accepted

Takayuki Tsurumi^*

Kenji Morita

Tokyo Daigaku, Bunkyo, Japan

The final, formatted version of the article will be published soon.

In learning goal-directed behavior, state representation is important for adapting to the environment and achieving goals. A predictive state representation called successive representation (SR) has recently attracted attention as a candidate for state representation in animal brains, especially in the hippocampus. The relationship between the SR and the animal brain has been studied, and several neural network models for computing the SR have been proposed based on the findings. However, studies on implementation of the SR involving action selection have not yet advanced significantly. Therefore, we explore possible mechanisms by which the SR is utilized biologically for action selection and learning optimal action policies. The actor-critic architecture is a promising model of animal behavioral learning in terms of its correspondence to the anatomy and function of the basal ganglia, so it is suitable for our purpose. In this study, we construct neural network models for behavioral learning using the SR. By using it to perform reinforcement learning, we investigate their properties. Specifically, we investigated the effect of using different state representations for the actor and critic in the actor-critic method, and also compared the actor-critic method with Q-learning and SARSA. We found the difference between the effect of using the SR for the actor and the effect of using the SR for the critic in the actor-critic method, and observed that using the SR in conjunction with one-hot encoding makes it possible to learn with the benefits of both representations. These results suggest the possibility that the striatum can learn using multiple state representations complementarily.

Keywords: successor representation, actor-critic, Neural Network, reinforcement learning, Striatum

Received: 15 Jun 2025; Accepted: 27 Oct 2025.

Copyright: © 2025 Tsurumi and Morita. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Takayuki Tsurumi, tkyk853isp@gmail.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.