An Uncertainty-Aware Reinforcement Learning Framework for Joint Demand Forecasting and Inventory Optimization

Chen, Baqiang; Chen, Baqiang

doi:10.3389/frai.2025.1695127

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Machine Learning and Artificial Intelligence

This article is part of the Research TopicFrontiers in Explainable AI: Positioning XAI for Action, Human-Centered Design, Ethics, and UsabilityView all articles

An Uncertainty-Aware Reinforcement Learning Framework for Joint Demand Forecasting and Inventory Optimization

Provisionally accepted

Baqiang Chen^*

Sichuan Academy of Social Sciences, Chengdu, China

The final, formatted version of the article will be published soon.

The optimization of inventory systems is a cornerstone of operational efficiency, yet it is persistently challenged by demand uncertainty. Traditional approaches that decouple demand forecasting from inventory control often propagate forecast errors, leading to suboptimal and brittle policies. In this paper, we propose a novel, integrated framework that synergizes probabilistic deep learning for forecasting with uncertainty-aware reinforcement learning (RL) for dynamic inventory optimization. Our primary innovations are threefold: (1) a probabilistic forecasting model (a Bayesian LSTM) that quantifies predictive uncertainty, providing not just a point forecast but a full predictive distribution; (2) an uncertainty-augmented state representation for the RL agent, which explicitly includes the forecasted demand's mean and variance, enabling it to learn risk-sensitive behaviors; and (3) a dynamic risk-adjusted reward function that penalizes the agent more severely for failures under low uncertainty, encouraging it to be robust when predictions are confident and cautiously opportunistic when they are not. We employ a Deep Deterministic Policy Gradient (DDPG) agent that learns a continuous, state-dependent order-up-to-level policy. Evaluated on real-world retail data and a series of challenging synthetic scenarios, our framework demonstrates marked superiority over traditional and state-of-the-art baselines. It achieves a 28-40% reduction in total inventory-related costs and improves the service level by 5-10 percentage points, while simultaneously reducing order volatility. The learned policies are shown to be both interpretable and economically rational. This work provides a new paradigm for building intelligent, adaptive, and risk-aware decision-making systems for supply chain management, showcasing how AI can effectively manage and leverage uncertainty to drive significant economic value.

Keywords: inventory management, reinforcement learning, Demand forecasting, UncertaintyQuantification, Deep Deterministic Policy Gradient (DDPG)

Received: 29 Aug 2025; Accepted: 24 Nov 2025.

Copyright: © 2025 Chen and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Baqiang Chen
Baqiang Chen

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.