Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. AI in Finance

Portfolio Management based on Value Distribution Reinforcement Learning Algorithm

Provisionally accepted
  • China Southern Power Grid (China), Guangzhou, China

The final, formatted version of the article will be published soon.

ABSTRACT In the face of high uncertainty and complexity in financial markets, achieving portfolio return maximization while effectively controlling risk remains a critical challenge in intelligent investment research. This paper proposes a novel portfolio management framework based on the Value Distribution Maximum Entropy Actor-Critic (VD-MEAC) reinforcement learning algorithm to address the challenges in financial investment optimization. We establish a reinforcement learning framework that maximizes portfolio returns, where the agent's actions directly represent portfolio weight adjustments, and stock factors serve as the state observations. For risk management, the Critic network learns the complete distribution of future returns rather than point estimates, effectively filtering out overconfident decision information to mitigate overestimation bias. For return enhancement, we incorporate entropy regularization to encourage exploration of the action space, preventing premature convergence to suboptimal strategies. We conduct extensive experiments using real market data from the Chinese stock market, performing multiple test iterations to verify the robustness of our strategy. Experimental results demonstrate that our VD-MEAC strategy achieves an average return of 2.490 and an average Sharpe ratio of 2.978, significantly outperforming benchmark strategies, including equal-weight portfolios, the CSI 300 index, and state-of-the-art reinforcement learning methods, across key performance metrics, such as return rate, maximum drawdown, and risk-adjusted returns. These results validate the effectiveness of our approach in practical portfolio management scenarios. The complete source code for this study is available at: https://github.com/YanYang/VD-MEAC.

Keywords: Portfolio optimization, reinforcement learning, Value Distribution Risk Management, Quantitative finance, actor-critic algorithm

Received: 23 Sep 2025; Accepted: 02 Dec 2025.

Copyright: © 2025 Yan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Yang Yan

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.