Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Robot. AI

Sec. Robot Learning and Evolution

Volume 12 - 2025 | doi: 10.3389/frobt.2025.1649154

This article is part of the Research TopicPhysical AI and Robotics – Outputs from IS-PAIR 2025 and BeyondView all articles

Weber-Fechner Law in Temporal Difference Learning derived from Control as Inference

Provisionally accepted
Keiichiro  TakahashiKeiichiro Takahashi1Taisuke  KobayashiTaisuke Kobayashi2,3*Tomoya  YamanokuchiTomoya Yamanokuchi1Takamitsu  MatsubaraTakamitsu Matsubara1
  • 1Nara Sentan Kagaku Gijutsu Daigakuin Daigaku, Ikoma, Japan
  • 2National Institute of Informatics, Chiyoda-ku, Japan
  • 3Sogo Kenkyu Daigakuin Daigaku, Miura District, Japan

The final, formatted version of the article will be published soon.

This paper investigates a novel nonlinear update rule for value and policy functions based on temporal difference (TD) errors in reinforcement learning (RL). The update rule in the standard RL states that the TD error is linearly proportional to the degree of updates, treating all rewards equally without no bias. On the other hand, the recent biological studies revealed that there are nonlinearities in the TD error and the degree of updates, biasing policies optimistic or pessimistic. Such biases in learning due to nonlinearities are expected to be useful and intentionally leftover features in biological learning. Therefore, this research explores a theoretical framework that can leverage the nonlinearity between the degree of the update and TD errors. To this end, we focus on a control as inference framework utilized in the previous work, in which the uncomputable nonlinear term needed to be approximately excluded from the derivation of the standard RL. By analyzing it, Weber-Fechner law (WFL) is found, namely, perception (a.k.a. the degree of updates) in response to stimulus change (a.k.a. TD error) is attenuated by increase in the stimulus intensity (a.k.a. the value function). To numerically reveal the utilities of WFL on RL, we then propose a practical implementation using a reward-punishment framework and modifying the definition of optimality. Further analysis of this implementation reveals that two utilities can be expected i) to accelerate escaping from the situations with small rewards, and ii) to pursue the minimum punishment as much as possible. We finally investigate and discuss the expected utilities through simulations and robot experiments. As a result, the proposed RL algorithm with WFL shows the expected utilities that accelerate the reward-maximizing startup and continue to suppress punishments during learning.

Keywords: reinforcement learning, temporal difference learning, control as inference, Reward-punishment framework, Weber-Fechner law, robot control

Received: 18 Jun 2025; Accepted: 25 Aug 2025.

Copyright: © 2025 Takahashi, Kobayashi, Yamanokuchi and Matsubara. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Taisuke Kobayashi, National Institute of Informatics, Chiyoda-ku, Japan

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.