<?xml version="1.0" encoding="utf-8"?>
    <rss version="2.0">
      <channel xmlns:content="http://purl.org/rss/1.0/modules/content/">
        <title>Frontiers in Robotics and AI | Robot Learning and Evolution section | New and Recent Articles</title>
        <link>https://www.frontiersin.org/journals/robotics-and-ai/sections/robot-learning-and-evolution</link>
        <description>RSS Feed for Robot Learning and Evolution section in the Frontiers in Robotics and AI journal | New and Recent Articles</description>
        <language>en-us</language>
        <generator>Frontiers Feed Generator,version:1</generator>
        <pubDate>2026-05-14T10:12:49.224+00:00</pubDate>
        <ttl>60</ttl>
        <item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frobt.2026.1816301</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frobt.2026.1816301</link>
        <title><![CDATA[Morphological symmetry-aware generalized policy network for deep reinforcement learning]]></title>
        <pubdate>2026-05-13T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Ryo Hakoda</author><author>Yubin Liu</author><author>Matthew Hwang</author><author>Yoshihiro Sato</author><author>Jun Takamatsu</author><author>Katsushi Ikeuchi</author><author>Takeshi Oishi</author>
        <description><![CDATA[Exploiting the morphological symmetry of robotic systems, such as humanoid and quadruped robots, is a promising direction for improving robot learning. In deep reinforcement learning (DRL) for robot control, prior studies have leveraged such symmetry to improve learning efficiency through data augmentation, equivariant multilayer perceptrons (EMLPs), and multi-agent reinforcement learning (MARL) formulations. However, DRL training is inherently unstable, as the data distribution strongly depends on exploration, which is driven by stochasticity in the environment. To address this issue, we propose a symmetry-assisted, general-purpose DRL framework for morphologically symmetric robots that enables stable and robust learning. The framework models the environment as a symmetric Markov decision process (MDP) and constructs a full-body policy from a single-sided base policy using symmetry operators. We further propose a symmetric PPO objective with a coupled importance-sampling ratio. This objective aligns the policy optimization process with the imposed symmetry and serves as a principled alternative to MAPPO-style multi-agent formulations. Experimental results demonstrate that the proposed method outperforms existing approaches on most symmetric tasks, while still maintaining performance comparable to or better than standard PPO on asymmetric tasks, where symmetry is less directly exploitable.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frobt.2026.1861947</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frobt.2026.1861947</link>
        <title><![CDATA[Editorial: Reinforcement learning for real-world robot navigation]]></title>
        <pubdate>2026-05-12T00:00:00Z</pubdate>
        <category>Editorial</category>
        <author>Pengqin Wang</author><author>Xiaocong Li</author><author>Meixin Zhu</author><author>Jun Ma</author>
        <description></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frobt.2026.1788395</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frobt.2026.1788395</link>
        <title><![CDATA[Adaptive multi-mode locomotion for bipedal wheel-legged robots via sparse mixture-of-experts deep reinforcement learning]]></title>
        <pubdate>2026-02-25T00:00:00Z</pubdate>
        <category>Brief Research Report</category>
        <author>Pan He</author><author>Zeang Zhao</author><author>Shengyu Duan</author><author>Panding Wang</author><author>Hongshuai Lei</author>
        <description><![CDATA[The bipedal wheel-legged robot combines the high energy efficiency of wheeled movement with the terrain adaptability of legged locomotion. However, achieving a smooth transition between these two heterogeneous motion modes within a unified control framework remains challenging. This study proposes a reinforcement learning control framework that integrates the Mixture of Experts (MoE) architecture. This approach employs a “divide and conquer” strategy by introducing a dynamic gating network and a Top-K sparse activation mechanism, which automatically allocates different motion modes to specific expert subnetworks, effectively decoupling conflicting gradients. Simulation results demonstrate that, compared to the single-network PPO method, the MoE-enhanced algorithm exhibits significant improvements in training stability and rewards. The learned policy successfully achieved smooth rolling on flat surfaces and transitioned to dynamic leg-lifting gaits when confronted with obstacles. In various test terrains, it showed a markedly higher success rate compared to the single-network PPO method.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frobt.2026.1697159</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frobt.2026.1697159</link>
        <title><![CDATA[Discovery of skill-switching criteria for learning agile quadruped locomotion]]></title>
        <pubdate>2026-02-18T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Wanming Yu</author><author>Fernando Acero</author><author>Vassil Atanassov</author><author>Chuanyu Yang</author><author>Ioannis Havoutis</author><author>Dimitrios Kanoulas</author><author>Zhibin Li</author>
        <description><![CDATA[This study develops a hierarchical learning and optimization framework that can learn and achieve well-coordinated multi-skill locomotion. The learned multi-skill policy can switch between skills automatically and naturally while tracking arbitrarily positioned goals and can recover from failures promptly. The proposed framework is composed of a deep reinforcement learning process and an optimization process. First, the contact pattern is incorporated into the reward terms to learn different types of gaits as separate policies without the need for any other references. Then, a higher-level policy is learned to generate weights for individual policies to compose multi-skill locomotion in a goal-tracking task setting. Skills are automatically and naturally switched according to the distance to the goal. The appropriate distances for skill switching are incorporated into the reward calculation for learning the high-level policy and are updated by an outer optimization loop as learning progresses. We first demonstrate successful multi-skill locomotion in comprehensive tasks on a simulated Unitree A1 quadruped robot. We also deploy the learned policy in the real world, showcasing trotting, bounding, galloping, and their natural transitions as the goal position changes. Moreover, the learned policy can react to unexpected failures at any time, perform prompt recovery, and successfully resume locomotion. Compared to baselines, our proposed approach achieves all the learned agile skills with improved learning performance, enabling smoother and more continuous skill transitions.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frobt.2026.1752914</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frobt.2026.1752914</link>
        <title><![CDATA[Deep learning-based robotic cloth manipulation applications: systematic review, challenges and opportunities for physical AI]]></title>
        <pubdate>2026-02-06T00:00:00Z</pubdate>
        <category>Systematic Review</category>
        <author>Ningquan Gu</author><author>Mitsuhiro Hayashibe</author><author>Kyo Kutsuzawa</author><author>Hui Yu</author>
        <description><![CDATA[Cloth unfolding and folding are fundamental tasks in autonomous robotic cloth manipulation as Physical AI. Driven by recent advances in deep learning, this area has developed rapidly in recent years. This review aims to systematically identify and summarize current progress in deep learning-based cloth unfolding and folding. Following the Systematic Reviews and Meta-Analyses (PRISMA) guidelines, 41 relevant papers from 2019 to 2024 were selected for analysis. We examines various factors influencing cloth manipulation and find that, while current methods show impressive performance, several challenges remain unaddressed. These challenges include irregular cloth sizes and diverse initial garment states. Concerning datasets, there is a need for improved real-world data collection systems and more realistic cloth simulators, and the Sim2Real gap must be carefully considered. Additionally, the review highlights the importance of incorporating multi-modal sensors into current platforms and the emergence of novel primitive actions that enhance performance. The need for more consistent comparison metrics is emphasized, and strategies for addressing failure modes are discussed to further advance the field. From an algorithmic perspective, we reorganize existing learning methods into six learning and control paradigms: perception-guided heuristics, goal-conditioned manipulation policies, predictive and model-based state representation methods, reward-driven reinforcement learning over primitive actions, demonstration-driven skill transfer methods, and emerging large language model-based planning methods. We discuss how each paradigm contributes to unfolding and folding, their respective strengths and limitations, and the open problems that arise. Finally, we summarize the remaining challenges and provide future perspectives for physical AI.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frobt.2025.1697155</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frobt.2025.1697155</link>
        <title><![CDATA[Coulomb force-guided deep reinforcement learning for effective and explainable robotic motion planning]]></title>
        <pubdate>2026-01-30T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Sirui Song</author><author>Trevor Bihl</author><author>Jundong Liu</author>
        <description><![CDATA[Training mobile robots through digital twins with deep reinforcement learning (DRL) has gained increasing attention to ensure efficient and safe navigation in complex environments. In this paper, we propose a novel physics-inspired DRL framework that achieves both effective and explainable motion planning. We represent the robot, destination, and obstacles as electrical charges and model their interactions using Coulomb forces. These forces are incorporated into the reward function, providing both attractive and repulsive signals to guide robot behavior. In addition, obstacle boundaries extracted from LiDAR segmentation are integrated as anticipatory rewards, allowing the robot to avoid collisions from a distance. The proposed model is first trained in Gazebo simulation environments and subsequently deployed on a real TurtleBot v3 robot. Extensive experiments in both simulation and real-world scenarios demonstrate the effectiveness of the proposed framework. Results show that our method significantly reduces collisions, maintains safe distances from obstacles, and generates safer trajectories toward the destinations.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frobt.2025.1682200</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frobt.2025.1682200</link>
        <title><![CDATA[Solving robotics tasks with prior demonstration via exploration-efficient deep reinforcement learning]]></title>
        <pubdate>2026-01-12T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Chengyandan Shen</author><author>Christoffer Sloth</author>
        <description><![CDATA[This paper proposes an exploration-efficient deep reinforcement learning with reference (DRLR) policy framework for learning robotics tasks incorporating demonstrations. The DRLR framework is developed based on an imitation bootstrapped reinforcement learning (IBRL) algorithm. Here, we propose to improve IBRL by modifying the action selection module. The proposed action selection module provides a calibrated Q-value, which mitigates the bootstrapping error that otherwise leads to inefficient exploration. Furthermore, to prevent the reinforcement learning (RL) policy from converging to a sub-optimal policy, soft actor–critic (SAC) is used as the RL policy instead of twin delayed DDPG (TD3). The effectiveness of our method in mitigating the bootstrapping error and preventing overfitting is empirically validated by learning two robotics tasks: bucket loading and open drawer, which require extensive interactions with the environment. Simulation results also demonstrate the robustness of the DRLR framework across tasks with both low and high state–action dimensions and varying demonstration qualities. To evaluate the developed framework on a real-world industrial robotics task, the bucket loading task is deployed on a real wheel loader. The sim-to-real results validate the successful deployment of the DRLR framework.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frobt.2025.1737238</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frobt.2025.1737238</link>
        <title><![CDATA[LG-H-PPO: offline hierarchical PPO for robot path planning on a latent graph]]></title>
        <pubdate>2026-01-07T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Xiang Han</author>
        <description><![CDATA[The path planning capability of autonomous robots in complex environments is crucial for their widespread application in the real world. However, long-term decision-making and sparse reward signals pose significant challenges to traditional reinforcement learning (RL) algorithms. Offline hierarchical reinforcement learning offers an effective approach by decomposing tasks into two stages: high-level subgoal generation and low-level subgoal attainment. Advanced Offline HRL methods, such as Guider and HIQL, typically introduce latent spaces in high-level policies to represent subgoals, thereby handling high-dimensional states and enhancing generalization. However, these approaches require the high-level policy to search and generate sub-objectives within a continuous latent space. This remains a complex and sample-inefficient challenge for policy optimization algorithms—particularly policy gradient-based PPO—often leading to unstable training and slow convergence. To address this core limitation, this paper proposes a novel offline hierarchical PPO framework—LG-H-PPO (Latent Graph-based Hierarchical PPO). The core innovation of LG-H-PPO lies in discretizing the continuous latent space into a structured “latent graph.” By transforming high-level planning from challenging “continuous creation” to simple “discrete selection,” LG-H-PPO substantially reduces the learning difficulty for the high-level policy. Preliminary experiments on standard D4RL offline navigation benchmarks demonstrate that LG-H-PPO achieves significant advantages over advanced baselines like Guider and HIQL in both convergence speed and final task success rates. The main contribution of this paper is introducing graph structures into latent variable HRL planning. This effectively simplifies the action space for high-level policies, enhancing the training efficiency and stability of offline HRL algorithms for long-sequence navigation tasks. It lays the foundation for future offline HRL research combining latent variable representations with explicit graph planning.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frobt.2025.1625968</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frobt.2025.1625968</link>
        <title><![CDATA[Adaptive mapless mobile robot navigation using deep reinforcement learning based improved TD3 algorithm]]></title>
        <pubdate>2025-12-18T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Shoaib Mohd Nasti</author><author>Zahoor Ahmad Najar</author><author>Mohammad Ahsan Chishti</author>
        <description><![CDATA[Navigating in unknown environments without prior maps poses a significant challenge for mobile robots due to sparse rewards, dynamic obstacles, and limited prior knowledge. This paper presents an Improved Deep Reinforcement Learning (DRL) framework based on the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm for adaptive mapless navigation. In addition to architectural enhancements, the proposed method offers theoretical benefits byincorporates a latent-state encoder and predictor module to transform high-dimensional sensor inputs into compact embeddings. This compact representation reduces the effective dimensionality of the state space, enabling smoother value-function approximation and mitigating overestimation errors common in actor–critic methods. It uses intrinsic rewards derived from prediction error in the latent space to promote exploration of novel states. The intrinsic reward encourages the agent to prioritize uncertain yet informative regions, improving exploration efficiency under sparse extrinsic reward signals and accelerating convergence. Furthermore, training stability is achieved through regularization of the latent space via maximum mean discrepancy (MMD) loss. By enforcing consistent latent dynamics, the MMD constraint reduces variance in target value estimation and results in more stable policy updates. Experimental results in simulated ROS2/Gazebo environments demonstrate that the proposed framework outperforms standard TD3 and other improved TD3 variants. Our model achieves a 93.1% success rate and a low 6.8% collision rate, reflecting efficient and safe goal-directed navigation. These findings confirm that combining intrinsic motivation, structured representation learning, and regularization-based stabilization produces more robust and generalizable policies for mapless mobile robot navigation.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frobt.2025.1731356</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frobt.2025.1731356</link>
        <title><![CDATA[Editorial: Advancements in neural learning control for enhanced multi-robot coordination]]></title>
        <pubdate>2025-11-21T00:00:00Z</pubdate>
        <category>Editorial</category>
        <author>Shude He</author><author>Shi-Lu Dai</author><author>Chengzhi Yuan</author><author>Haotian Shi</author>
        <description></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frobt.2025.1567211</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frobt.2025.1567211</link>
        <title><![CDATA[Comparative analysis of deep Q-learning algorithms for object throwing using a robot manipulator]]></title>
        <pubdate>2025-11-14T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Mohammad Al Homsi</author><author>Maja Trumić</author><author>Adriano Fagiolini</author><author>Giansalvo Cirrincione</author>
        <description><![CDATA[Recent advances in artificial intelligence (AI) have attracted significant attention due to AI’s ability to solve complex problems and the rapid development of learning algorithms and computational power. Among the many AI techniques, transformers stand out for their flexible architectures and high computational capacity. Unlike traditional neural networks, transformers use mechanisms such as self-attention with positional encoding, which enable them to effectively capture long-range dependencies in sequential and spatial data. This paper presents a comparison of various deep Q-learning algorithms and proposes two original techniques that use self-attention into deep Q-learning. The first technique is structured self-attention with deep Q-learning, and the second uses multi-head attention with deep Q-learning. These methods are compared with different types of deep Q-learning and other temporal techniques in uncertain tasks, such as throwing objects to unknown targets. The performance of these algorithms is evaluated in a simplified environment, where the task involves throwing a ball using a robotic arm manipulator. This setup provides a controlled scenario to analyze the algorithms’ efficiency and effectiveness in solving dynamic control problems. Additional constraints are introduced to evaluate performance under more complex conditions, such as a joint lock or the presence of obstacles like a wall near the robot or the target. The output of the algorithm includes the correct joint configurations and trajectories for throwing to unknown target positions. The use of multi-head attention has enhanced the robot’s ability to prioritize and interact with critical environmental features. The paper also includes a comparison of temporal difference algorithms to address constraints on the robot’s joints. These algorithms are capable of finding solutions within the limitations of existing hardware, enabling robots to interact intelligently and autonomously with their environment.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frobt.2025.1652050</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frobt.2025.1652050</link>
        <title><![CDATA[Trustworthy navigation with variational policy in deep reinforcement learning]]></title>
        <pubdate>2025-10-08T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Karla Bockrath</author><author>Liam Ernst</author><author>Rohaan Nadeem</author><author>Bryan Pedraza</author><author>Dimah Dera</author>
        <description><![CDATA[IntroductionDeveloping a reliable and trustworthy navigation policy in deep reinforcement learning (DRL) for mobile robots is extremely challenging, particularly in real-world, highly dynamic environments. Particularly, exploring and navigating unknown environments without prior knowledge, while avoiding obstacles and collisions, is very cumbersome for mobile robots. MethodsThis study introduces a novel trustworthy navigation framework that utilizes variational policy learning to quantify uncertainty in the estimation of the robot’s action, localization, and map representation. Trust-Nav employs the Bayesian variational approximation of the posterior distribution over the policy-based neural network’s parameters. Policy-based and value-based learning are combined to guide the robot’s actions in unknown environments. We derive the propagation of variational moments through all layers of the policy network and employ a first-order approximation for the nonlinear activation functions. The uncertainty in robot action is measured by the propagated variational covariance in the DRL policy network. At the same time, the uncertainty in the robot’s localization and mapping is embedded in the reward function and stems from the traditional Theory of Optimal Experimental Design. The total loss function optimizes the parameters of the policy and value networks to maximize the robot’s cumulative reward in an unknown environment.ResultsExperiments conducted using the Gazebo robotics simulator demonstrate the superior performance of the proposed Trust-Nav model in achieving robust autonomous navigation and mapping.DiscussionTrust-Nav consistently outperforms deterministic DRL approaches, particularly in complicated environments involving noisy conditions and adversarial attacks. This integration of uncertainty into the policy network promotes safer and more reliable navigation, especially in complex or unpredictable environments. Trust-Nav offers a step toward deployable, self-aware robotic systems capable of recognizing and responding to their own limitations.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frobt.2025.1649154</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frobt.2025.1649154</link>
        <title><![CDATA[Weber–Fechner law in temporal difference learning derived from control as inference]]></title>
        <pubdate>2025-09-25T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Keiichiro Takahashi</author><author>Taisuke Kobayashi</author><author>Tomoya Yamanokuchi</author><author>Takamitsu Matsubara</author>
        <description><![CDATA[This study investigates a novel nonlinear update rule for value and policy functions based on temporal difference (TD) errors in reinforcement learning (RL). The update rule in standard RL states that the TD error is linearly proportional to the degree of updates, treating all rewards equally without any bias. On the other hand, recent biological studies have revealed that there are nonlinearities in the TD error and the degree of updates, biasing policies towards being either optimistic or pessimistic. Such biases in learning due to nonlinearities are expected to be useful and intentionally leftover features in biological learning. Therefore, this research explores a theoretical framework that can leverage the nonlinearity between the degree of the update and TD errors. To this end, we focus on a control as inference framework utilized in the previous work, in which the uncomputable nonlinear term needed to be approximately excluded from the derivation of the standard RL. By analyzing it, the Weber–Fechner law (WFL) is found, in which perception (i.e., the degree of updates) in response to a change in stimulus (i.e., TD error) is attenuated as the stimulus intensity (i.e., the value function) increases. To numerically demonstrate the utilities of WFL on RL, we propose a practical implementation using a reward–punishment framework and modify the definition of optimality. Further analysis of this implementation reveals that two utilities can be expected: i) to accelerate escaping from the situations with small rewards and ii) to pursue the minimum punishment as much as possible. We finally investigate and discuss the expected utilities through simulations and robot experiments. As a result, the proposed RL algorithm with WFL shows the expected utilities that accelerate the reward-maximizing startup and continue to suppress punishments during learning.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frobt.2025.1606247</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frobt.2025.1606247</link>
        <title><![CDATA[Diffusion models for robotic manipulation: a survey]]></title>
        <pubdate>2025-09-09T00:00:00Z</pubdate>
        <category>Review</category>
        <author>Rosa Wolf</author><author>Yitian Shi</author><author>Sheng Liu</author><author>Rania Rayyes</author>
        <description><![CDATA[Diffusion generative models have demonstrated remarkable success in visual domains such as image and video generation. They have also recently emerged as a promising approach in robotics, especially in robot manipulations. Diffusion models leverage a probabilistic framework, and they stand out with their ability to model multi-modal distributions and their robustness to high-dimensional input and output spaces. This survey provides a comprehensive review of state-of-the-art diffusion models in robotic manipulation, including grasp learning, trajectory planning, and data augmentation. Diffusion models for scene and image augmentation lie at the intersection of robotics and computer vision for vision-based tasks to enhance generalizability and data scarcity. This paper also presents the two main frameworks of diffusion models and their integration with imitation learning and reinforcement learning. In addition, it discusses the common architectures and benchmarks and points out the challenges and advantages of current state-of-the-art diffusion-based methods.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frobt.2025.1615427</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frobt.2025.1615427</link>
        <title><![CDATA[Project-based learning with arduino robots: impact on undergraduate students’ achievement and task persistence in robotics programming]]></title>
        <pubdate>2025-07-10T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Fadip Audu Nannim</author><author>Nnenna E. Ibezim</author><author>Moeketsi Mosia</author><author>Basil C. E. Oguguo</author>
        <description><![CDATA[IntroductionProgramming is a fundamental skill in the 21st century, yet there is a global shortage of skilled programmers for high-tech jobs. This study determined the effects of Project-Based Arduino Robot Application (PARA) on undergraduate students’ achievement and task persistence in robotics programming.MethodsThe quasi-experimental research design was adopted for the study. A sample of 74 second-year computer and robotics education students from three intact classes in three tertiary institutions offering robotics programming II were selected forthe study.Results and DiscussionPARA improved the academic achievement of students in robotics programming (63.00 ± 16.81) more than the conventional method, which uses Interactive PowerPoint (IPP) (43.79 ± 12.07). PARA improved the task persistence of students in robotics programming (73.75 ± 13.46) more than the conventional method (40.00 ± 13.70). Male students taught robotics programming using PARA had a slightly higher mean achievement score (69.60 ± 11.50) than their female counterparts (52.00 ± 19.43). Female students taught robotics programming using PARA had a slightly higher mean task persistence score (78.67 ± 11.96) than their male counterparts (70.80 ± 14.02). There was a significant difference (p < 0.05) in students’ mean achievement scores based on the instruction method used in teaching robotics programming, among others. These findings have implications for instructing students who find robotics programming difficult and abstract.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frobt.2025.1487844</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frobt.2025.1487844</link>
        <title><![CDATA[Bat optimization of hybrid neural network-FOPID controllers for robust robot manipulator control]]></title>
        <pubdate>2025-05-02T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Bashra Kadhim Oleiwi</author><author>Mohamed Jasim</author><author>Ahmad Taher Azar</author><author>Saim Ahmed</author><author>Ahmed Redha Mahlous</author>
        <description><![CDATA[The position and trajectory tracking control of rigid-link robot manipulators suffers from problems such as poor accuracy, unstable performance, and response caused by unidentified loads and outside disturbances. In this paper, three control structures have been proposed to control a multi-input, multi-output coupled nonlinear three-link rigid robot manipulator (3-LRRM) system and effectively solve the signal chattering in the control signal. To overcome these problems, three hybrid control structures based on combinations between the benefits of fractional order proportional-integral-derivative operations (FOPID) and the benefits of neural networks are proposed for a 3-LRRM. The first hybrid control scheme is a neural network- (NN) like fractional order proportional-integral plus an NN-like fractional order proportional derivative controller (NN-FOPIPD) and the second control scheme is an NN plus FOPID controller (NN + FOPID). In contrast, the third control scheme is the Elman NN-like FOPID controller (ELNN-FOPID). The bat optimization algorithm (BOA) is applied to find the best parameter values of the proposed control scheme by minimizing the performance index of the integral time square error (ITSE). MATLAB software is used to carry out the simulation results. Using the simulation tests, the performance of the suggested controllers is compared without retraining the controller parameters. The robustness of the designed control schemes’ performance is assessed utilizing uncertainties in system parameters, outside disturbances, and initial position changes. The results show that the NN-FOPIPD structure demonstrated the best performance among the suggested controllers.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frobt.2025.1542692</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frobt.2025.1542692</link>
        <title><![CDATA[Seamless multi-skill learning: learning and transitioning non-similar skills in quadruped robots with limited data]]></title>
        <pubdate>2025-04-30T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Jiaxin Tu</author><author>Peng Zhai</author><author>Yueqi Zhang</author><author>Xiaoyi Wei</author><author>Zhiyan Dong</author><author>Lihua Zhang</author>
        <description><![CDATA[In multi-skill imitation learning for robots, expert datasets with complete motion features are crucial for enabling robots to learn and transition between different skills. However, such datasets are often difficult to obtain. As an alternative, datasets constructed using only joint positions are more accessible, but they are incomplete and lack details, making it challenging for existing methods to effectively learn and model skill transitions. To address these challenges, this study introduces the Seamless Multi-Skill Learning (SMSL) framework. Integrated within the Adversarial Motion Priors framework and incorporating self-trajectory augmentation techniques, SMSL effectively utilizes high-quality historical experiences to guide agents in learning skills and generating smooth, natural transitions between them, addressing the learning difficulties caused by incomplete expert datasets. Additionally, the research incorporates an adaptive command sampling mechanism to balance the training opportunities for skills of various difficulties and prevent catastrophic forgetting. Our experiments highlight potential issues with baseline methods when imitating incomplete expert datasets and demonstrate the superior performance of the SMSL framework. Sim-to-real experiments on real Solo8 robots further validate the effectiveness of SMSL. Overall, this study confirms the SMSL framework’s capability in real robotic applications and underscores its potential for autonomous skill learning and generation from minimal data.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frobt.2025.1492526</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frobt.2025.1492526</link>
        <title><![CDATA[Reinforcement learning-based dynamic field exploration and reconstruction using multi-robot systems for environmental monitoring]]></title>
        <pubdate>2025-03-25T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Thinh Lu</author><author>Divyam Sobti</author><author>Deepak Talwar</author><author>Wencen Wu</author>
        <description><![CDATA[In the realm of real-time environmental monitoring and hazard detection, multi-robot systems present a promising solution for exploring and mapping dynamic fields, particularly in scenarios where human intervention poses safety risks. This research introduces a strategy for path planning and control of a group of mobile sensing robots to efficiently explore and reconstruct a dynamic field consisting of multiple non-overlapping diffusion sources. Our approach integrates a reinforcement learning-based path planning algorithm to guide the multi-robot formation in identifying diffusion sources, with a clustering-based method for destination selection once a new source is detected, to enhance coverage and accelerate exploration in unknown environments. Simulation results and real-world laboratory experiments demonstrate the effectiveness of our approach in exploring and reconstructing dynamic fields. This study advances the field of multi-robot systems in environmental monitoring and has practical implications for rescue missions and field explorations.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frobt.2024.1491907</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frobt.2024.1491907</link>
        <title><![CDATA[Adaptive formation learning control for cooperative AUVs under complete uncertainty]]></title>
        <pubdate>2025-02-14T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Emadodin Jandaghi</author><author>Mingxi Zhou</author><author>Paolo Stegagno</author><author>Chengzhi Yuan</author>
        <description><![CDATA[IntroductionThis paper addresses the critical need for adaptive formation control in Autonomous Underwater Vehicles (AUVs) without requiring knowledge of system dynamics or environmental data. Current methods, often assuming partial knowledge like known mass matrices, limit adaptability in varied settings.MethodsWe proposed two-layer framework treats all system dynamics, including the mass matrix, as entirely unknown, achieving configuration-agnostic control applicable to multiple underwater scenarios. The first layer features a cooperative estimator for inter-agent communication independent of global data, while the second employs a decentralized deterministic learning (DDL) controller using local feedback for precise trajectory control. The framework's radial basis function neural networks (RBFNN) store dynamic information, eliminating the need for relearning after system restarts.ResultsThis robust approach addresses uncertainties from unknown parametric values and unmodeled interactions internally, as well as external disturbances such as varying water currents and pressures, enhancing adaptability across diverse environments. DiscussionComprehensive and rigorous mathematical proofs are provided to confirm the stability of the proposed controller, while simulation results validate each agent’s control accuracy and signal boundedness, confirming the framework’s stability and resilience in complex scenarios.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frobt.2024.1444188</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frobt.2024.1444188</link>
        <title><![CDATA[HPRS: hierarchical potential-based reward shaping from task specifications]]></title>
        <pubdate>2025-02-10T00:00:00Z</pubdate>
        <category>Methods</category>
        <author>Luigi Berducci</author><author>Edgar A. Aguilar</author><author>Dejan Ničković</author><author>Radu Grosu</author>
        <description><![CDATA[The automatic synthesis of policies for robotics systems through reinforcement learning relies upon, and is intimately guided by, a reward signal. Consequently, this signal should faithfully reflect the designer’s intentions, which are often expressed as a collection of high-level requirements. Several works have been developing automated reward definitions from formal requirements, but they show limitations in producing a signal which is both effective in training and able to fulfill multiple heterogeneous requirements. In this paper, we define a task as a partially ordered set of safety, target, and comfort requirements and introduce an automated methodology to enforce a natural order among requirements into the reward signal. We perform this by automatically translating the requirements into a sum of safety, target, and comfort rewards, where the target reward is a function of the safety reward and the comfort reward is a function of the safety and target rewards. Using a potential-based formulation, we enhance sparse to dense rewards and formally prove this to maintain policy optimality. We call our novel approach hierarchical, potential-based reward shaping (HPRS). Our experiments on eight robotics benchmarks demonstrate that HPRS is able to generate policies satisfying complex hierarchical requirements. Moreover, compared with the state of the art, HPRS achieves faster convergence and superior performance with respect to the rank-preserving policy-assessment metric. By automatically balancing competing requirements, HPRS produces task-satisfying policies with improved comfort and without manual parameter tuning. Through ablation studies, we analyze the impact of individual requirement classes on emergent behavior. Our experiments show that HPRS benefits from comfort requirements when aligned with the target and safety and ignores them when in conflict with the safety or target requirements. Finally, we validate the practical usability of HPRS in real-world robotics applications, including two sim-to-real experiments using F1TENTH vehicles. These experiments show that a hierarchical design of task specifications facilitates the sim-to-real transfer without any domain adaptation.]]></description>
      </item>
      </channel>
    </rss>