AUTHOR=Rosales Rafael , Cavalcanti Dave 

TITLE=Reinforcement learning, rule-based, or generative AI: a comparison of model-free Wi-Fi slicing approaches

JOURNAL=Frontiers in Signal Processing

VOLUME=Volume 5 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/signal-processing/articles/10.3389/frsip.2025.1608347

DOI=10.3389/frsip.2025.1608347

ISSN=2673-8198

ABSTRACT=Resource allocation techniques are key to providing Quality-of-Service guarantees. Wi-Fi standards define features enabling the allocation of radio resources across time, frequency, and link band. However, radio resource slicing, as implemented in 5G cellular networks, is not native to Wi-Fi. A few reinforcement learning (RL) approaches have been proposed for Wi-Fi resource allocation and demonstrated using analytical models where the reward gradient with respect to the model parameters is accessible—i.e., with a differentiable Wi-Fi network model. In this work, we implement—and release under an Apache 2.0 license—a state-of-the-art, state-augmented constrained optimization method using a policy-gradient RL algorithm that does not require a differentiable model, to assess model-free RL-based slicing for Wi-Fi frequency resource allocation. We compare this with six model-free baselines: three RL algorithms (REINFORCE, A2C, PPO), two rule-based heuristics (Uniform, Proportional), and a generative AI policy using a commercial foundational Large Language Model (LLM). For rapid RL training, a simple, non-differentiable network model was used. To evaluate the policies, we use an ns-3-based Wi-Fi 6 simulator with a slice-aware MAC. Evaluations were conducted in two traffic scenarios: A) a periodic pattern with one constant low-throughput slice and two high-throughput slices toggled sequentially, and B) a random walk scenario for realism. Results show that, on average—in terms of the trade-off between total throughput and a packet-latency-based metric—the uniform split and LLM-based policy perform best, appearing on the Pareto front in both scenarios. The proportional policy only appears on the front in the periodic case. Our state-augmented constrained approach based on REINFORCE (SAC-RE) is on the second Pareto front for the random walk case, outperforming vanilla REINFORCE. In the periodic scenario, vanilla REINFORCE achieves better throughput—with a latency trade-off—and is co-located with SAC-RE on the second front. Interestingly, the LLM-based policy—neither trained nor fine-tuned on any custom data—consistently appears on the first Pareto front, offering higher objective values at some latency cost. Unlike uniform slicing, its behavior is dynamically adjustable via prompt engineering.