BRIEF RESEARCH REPORT article

Front. Robot. AI

Sec. Robot Learning and Evolution

Adaptive multi-mode locomotion for bipedal wheel-legged robots via sparse mixture-of-experts deep reinforcement learning

  • Beijing Institute of Technology, Beijing, China

Article metrics

View details

261

Views

The final, formatted version of the article will be published soon.

Abstract

The bipedal wheel-legged robot combines the high energy efficiency of wheeled movement with the 8 terrain adaptability of legged locomotion. However, achieving a smooth transition between these two 9 heterogeneous motion modes within a unified control framework remains challenging. This study 10 proposes a reinforcement learning control framework that integrates the Mixture of Experts (MoE) 11 architecture. This approach employs a "divide and conquer" strategy by introducing a dynamic gating 12 network and a Top-K sparse activation mechanism, which automatically allocates different motion 13 modes to specific expert subnetworks, effectively decoupling conflicting gradients. Simulation 14 results demonstrate that, compared to the single-network PPO method, the MoE-enhanced algorithm 15 exhibits significant improvements in training stability and rewards. The learned policy successfully 16 achieved smooth rolling on flat surfaces and transitioned to dynamic leg-lifting gaits when 17 confronted with obstacles. In various test terrains, it showed a markedly higher success rate 18 compared to the single-network PPO method.

Summary

Keywords

Bipedal wheel-legged robot1, Curriculum learning5, Gradient conflict4, mixture of experts3, Reinforcement learning2

Received

15 January 2026

Accepted

11 February 2026

Copyright

© 2026 He, Zhao, Duan, Wang and Lei. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Zeang Zhao; Shengyu Duan

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Share article

Article metrics