ORIGINAL RESEARCH article

Front. Robot. AI, 21 May 2025

Sec. Biomedical Robotics

Volume 12 - 2025 | https://doi.org/10.3389/frobt.2025.1537470

Learning to suppress tremors: a deep reinforcement learning-enabled soft exoskeleton for Parkinson’s patients

  • 1Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
  • 2Jedlik Innovation Ltd., Budapest, Hungary
  • 3András Pető Faculty, Semmelweis University, Budapest, Hungary

Introduction: Neurological tremors, prevalent among a large population, are one of the most rampant movement disorders. Biomechanical loading and exoskeletons show promise in enhancing patient well-being, but traditional control algorithms limit their efficacy in dynamic movements and personalized interventions. Furthermore, a pressing need exists for more comprehensive and robust validation methods to ensure the effectiveness and generalizability of proposed solutions.

Methods: This paper proposes a physical simulation approach modeling multiple arm joints and tremor propagation. This study also introduces a novel adaptable reinforcement learning environment tailored for disorders with tremors. We present a deep reinforcement learning-based encoder-actor controller for Parkinson’s tremors in various shoulder and elbow joint axes displayed in dynamic movements.

Results: Our findings suggest that such a control strategy offers a viable solution for tremor suppression in real-world scenarios.

Discussion: By overcoming the limitations of traditional control algorithms, this work takes a new step in adapting biomechanical loading into the everyday life of patients. This work also opens avenues for more adaptive and personalized interventions in managing movement disorders.

1 Introduction

Neurodegenerative diseases are characterized by the loss of neurons in the central nervous system, which can impact an individual’s quality of life by causing cognitive, motor, or behavioral symptoms Lamptey et al. (2022). The occurrence of these disorders is expected to increase, partly due to the recent growth in the aging population Heemels (2016). Neurological tremors are the most common of the movement disorders Louis et al. (1995), present in multiple neurodegenerative disorders such as essential tremor Deuschl et al. (1998) and Parkinson’s disease Lang and Lozano (1998b), Lang and Lozano (1998a). Tremors can be described as involuntary, oscillating, or rhythmic movements Bhatia et al. (2018). Although these are not life-threatening, movement disorders pose serious difficulties in daily activities, functional disabilities, and social inconvenience, as well as difficulties performing tasks that require fine motor skills for two-thirds of the affected patients Rocon et al. (2007b).

Although there is no cure for neurodegenerative diseases, current treatments aim to alleviate symptoms and enhance patient well-being. Invasive options, such as deep brain stimulation Oliveira et al. (2023); Faraji et al. (2023), neurosurgery Albano et al. (2023), and stem cell therapies Heris et al. (2022), can be effective but often come with high costs and severe side effects. Non-invasive treatments have been explored, ranging from medication Asil et al. (2020) and traditional Chinese therapies Cai et al. (2023) to advanced wearable technologies. These include robotic exoskeletons Rocon et al. (2007a); Herrnstadt and Menon (2016) and soft exoskeletons Skaramagkas et al. (2020); Awantha et al. (2020); Zahedi et al. (2021), as well as functional electrical stimulation (FES) devices Dosen et al. (2014); Jitkritsadakul et al. (2017), which use electrical stimulation. Additionally, afferent neuroprostheses have been developed to stimulate the patient’s central nervous system Pascual-Valdunciel et al. (2020); Dideriksen et al. (2017). Of all the non-invasive treatment options, the use of exoskeletons has been proven to be the most efficient method for the suppression of tremors Lora-Millan et al. (2021).

Wearable exoskeleton research mainly focused on reducing the weight of exoskeletons Yi et al. (2019); Wang et al. (2023) due to their bulk and weight limiting their adoption. Therefore, control algorithms were not the main interest of these studies, which often utilized repetitive control Rocon et al. (2007a), traditional control methods Herrnstadt and Menon (2016); Zhou et al. (2017); Yi et al. (2019); Zahedi et al. (2021), tremor frequency noise filtering Taheri et al. (2013), or equivalent-input-disturbance (EID) tremor suppression Xie et al. (2024). Traditional control methods, though widely used, have significant limitations. They are typically validated on low-degree-of-freedom systems and under static conditions, overlooking tremor propagation and the natural frequencies of voluntary movements. Evaluations often rely on healthy subjects mimicking tremors, which fails to capture the multi-harmonic characteristics of Parkinson’s tremors. Additionally, these methods do not quantify or account for interference with voluntary motion. For dynamic movements, traditional methods require either time-consuming patient-specific training with human-in-the-loop optimization Siviy et al. (2023); Ding et al. (2018) or manual rule design for each activity, limiting the scalability and adoption of wearable robotics Slade et al. (2022). In contrast, recent advances in deep reinforcement learning (DRL) have shown promise in managing stochastic action spaces in robotics Jin et al. (2022); Kaufmann et al. (2023); Haarnoja et al. (2023) and are gaining traction for rehabilitation exoskeletons, as DRL enables simulation-based training without additional patient involvement Luo et al. (2021), Luo et al. (2023), Luo et al. (2024).

Therefore, our work aims to incorporate recent advances in DRL and makes the following central contributions to the field of biomechanical loading exoskeletons:

• We create a human–exoskeleton simulation environment that is capable of simulating multiple different dynamic movements, different types of tremors, and human–exoskeleton interactions.

• We propose a model-free deep RL-based tremor-suppression controller capable of suppressing generated tremors across various axes of the shoulder and elbow joints during a multitude of dynamic movements.

• We demonstrate that the soft exoskeleton Figure 1A, coupled with our DRL-based controller, can accurately mitigate the effect of generated tremors.

The result is an intelligent tremor-suppression controller that minimizes its effects on the patient’s original movements and posture, with no additional training required from the patient to adapt to the exoskeleton.

In the following sections, we detail the underlying physical simulation and the DRL framework, describe the experimental setup and evaluation metrics, present our results, and discuss the implications of our approach for future wearable robotics in the treatment of neurodegenerative movement disorders.

2 Methods

2.1 Tremor-suppression physical simulation

To facilitate the training of a reinforcement learning-based controller, we established a physical simulation environment to ensure a secure and cost-effective learning process. The simulation uses a human torso model with the addition of the right arm, in which the tremors induced will be suppressed. The simulation environment uses the Pybullet physics engine Coumans and Bai (2016) and the Open AI gym Brockman et al. (2016) to create a reinforcement learning environment.

The human–exoskeleton simulation is made up of three distinct parts. The movements were recorded using two Velcro sleeves fixed around the upper and lower arm, with an additional inertial measurement unit (IMU) sensor fixed on the scapula of the right arm, as presented in Figure 1B. These parts of the simulation are illustrated in Figure 2, and described in the following sections.

Figure 1
www.frontiersin.org

Figure 1. The anatomy of the tremor-suppression exoskeleton and the corresponding inertial measurement unit (IMU) reference movement acquisition system. (A) The soft-robotic exoskeleton used in the tremor-suppression simulations. (B) The IMU reference movement acquisition system. (A) Actuator positions. (B) IMU sensor positions.

Figure 2
www.frontiersin.org

Figure 2. The parts of the simulation. Voluntary movements represent the trajectory of the movement in which involuntary movement tremors are generated, which the exoskeleton tries to suppress using its actuators.

2.1.1 Acquiring reference movements

The reference movements represent the patients’ voluntary movements, which act as the base trajectory for the environments. In the reference motions, we have recorded four distinct movements: shoulder flexion/extension, shoulder abduction/adduction, elbow flexion/extension, and the external rotation of the shoulder. Two distinct recordings are used for each distinct movement pattern for training to add variability and improve the robustness of the controller.

Of the five IMU sensors this system possessed, IMU 2 and IMU 4 were chosen. From the accelerations and angular accelerations measured, we could approximate the quaternions of the shoulder and elbow joints using an extended Kalman filter Welch and Bishop (1995). Finally, the quaternions were transformed into Euler angles, which were used in the simulation.

Verbal informed consent was exchanged between the authors and the subject when planning, preparing, and executing the IMU measurements.

2.1.2 Generating tremors

Our generated involuntary movements can be described by three attributes: their amplitude, frequency, and the time duration during which the tremor effects are present. The third attribute can be disregarded to ensure a more computationally effective simulation and training of the control. Therefore, tremorous movement parts are present at every simulation time step.

In our simulations, we utilized Parkinson’s disease tremors due to their well-documented and well-understood characteristics. Tremors present in Parkinson’s disease can be described as a second-order non-linear stochastic process Taheri et al. (2013), which can be approximated by the superposition of sine waves Riviere et al. (1997).

In this paper, we approximated these tremors by two sine waves with given parameters based on Taheri et al. (2013). This way, the approximation contains 96.4±1.39% of the original energy of the tremor.

From the two main frequency ranges, we randomly sampled values for both harmonics independent from each other and added Gaussian noise to their sum, which was then consequentially normalized. To ensure a wide range of possible tremor cases, a vector containing the given arm joint axis was used to specify which joint axis was affected by tremors.

Finally, from this created vector, which contains the tremor acceleration values for each joint axis in the arm, we transformed these values into torque values based on the measurements done by Ketteringham et al. (2014).

In tremor instances, where the effect of tremor impacts multiple joint axes, the frequencies are kept the same for all involved axes Davidson and Charles (2017).

2.1.3 Defining human–exoskeleton interactions

Our simulation incorporates reference, tremorous, and exoskeleton-induced movements using a position-controlled upper torso model with one tremor-affected arm.

The exoskeleton uses an active control strategy that applies force directly to the arm. The following equations describe the process in which the force is converted into torque values that are used during the training process.

First, we can denote an actuator’s state by knowing the positions of their two ends. We denote these by naming the starting point of the actuator with the number 1 and the endpoint with 2, where the actuator will exert its force and pull towards the start point.

The actuator force is a 3D force whose components are proportional to the angles of displacement that the two points create. With the denoted displacement angles, force components that the actuator creates on the arm at that given position can be calculated using Equations 13.

Fx=cosatan2P2yP1y,P2xP1xF(1)
Fy=cosatan2P2xP1x,P2yP1yF(2)
Fz=cosatan2P2zP1z,P2xP1xF(3)

To calculate the torque vectors these forces create, we first calculate the position vectors. These can be calculated with a simple vector subtraction of the point denoting the position of the specific joint (shoulder or elbow) and the endpoint of the actuator.

Finally, the torque values are calculated by the vector product of the force components and the position vectors and summed up for each specific joint axis. The values are then used inside the reinforcement learning environment.

2.1.4 The simulation system

For the control’s learning loop (Figure 3A), reference movements and the joint axes are chosen in which tremors are present. At the beginning of the episode, all actuator forces are set to 0. After summing up the actuator and tremor-generated torque values, Equation 4, a second-order, seven-variable differential equation, is solved (Davidson and Charles, 2017; Corie and Charles, 2019):

I̲̲q̈+D̲̲q̇+K̲̲q=τ,(4)

where q=[q1,q2,q3,q4,q5,q6,q7]T is the angle of displacement in each joint degree of freedom (DoF). The elements of q represent the following angles: q1: shoulder flexion/extension (SFE), q2: shoulder abduction/adduction (SAA), q3: shoulder external/internal rotation (SEIR), q4: elbow flexion/extension (EFE), q5: forearm pronation/supination (FPS), q6: wrist flexion/extension (WFE), and q7: wrist radial-ulnar deviation (WRUD).

Figure 3
www.frontiersin.org

Figure 3. The complete learning process of the control policy. As a deep reinforcement learning agent, we construct our controller as a multilayer perceptron (MLP) neural network. The control policy and encoder networks are updated as described in the TD7 algorithm.

The 7 × 7 matrices present the coupled inertia I̲̲, damping D̲̲ and stiffness K̲̲ of the mentioned DoF, respectively Davidson and Charles (2017).

Therefore, this takes tremor propagation into account with the inclusion of anatomically coupled properties of the joints. Furthermore, upon closer inspection, we can break down the torque values τ to the following components (Corie and Charles, 2019): τI: torque required to perform an intentional task, τT: torque generating the tremor, τL: task load torque, and τG: gravitational torque, τO: torque generated by the orthosis (exoskeleton) on the particular joint.

The mentioned components τI, τL, and τG are covered by the reference movement, thus leaving the torque generated by the tremor and the exoskeleton to find the unknown joint angle displacement values in our calculation.

With the calculated joint angle values based on the reference motion recording and tremor–exoskeleton interaction, we simulate one step in our simulation and receive a new state observation.

This new state observation is then propagated through an encoder neural network to further extract hidden information or unrealized correlations in the data, which are then given as the input to the control policy alongside the original observation vector that the encoder received.

For the anatomical properties of the simulation, the joint angle ranges are based on Zwerus et al. (2019) and Gill et al. (2020). The maximum joint torques for the voluntary motion are designed according to Otis et al. (1990) and Günzkofer et al. (2012). The upper and lower arm weight ratios are defined by Plagenhoef et al. (1983).

2.1.5 Dynamics randomization

Although simulation-based training provides a safe and efficient way to train our controller, there is a well-known discrepancy called the sim-to-real gap between the physical and real-world environments.

In order to overcome this obstacle and improve the robustness of our control, we employ dynamics randomization (Sadeghi and Levine, 2016; Tobin et al., 2017; Peng et al., 2018b).

This method randomly samples environmental characteristics from a given uniform distribution (Table 1) at the beginning of each episode. This forces our agent to be more robust against perturbations present in the environmental characteristics and to better adapt to the real-world environment, whose characteristics are expected to be present in the given distribution ranges.

2.2 Control algorithm training

In this section, we propose a deep reinforcement learning-based training and testing framework that enables the learning of optimal tremor-suppression strategy.

2.2.1 Reinforcement learning background

Reinforcement learning (RL) is a branch of machine learning that deals with sequential decision-making problems (Sutton and Barto, 2018). The objective is to learn an optimal policy π that enables an agent to maximize its return through interactions with a specified environment. The return, which is described as the discounted cumulative rewards the agent collects, is defined by Equation 5.

Rt=i=tTγitrsi,ai(5)

The agent at each discrete time step t, with a corresponding state sS, selects an action aA with respect to its policy π:SA, receiving reward r and a new state of the environment s.

Deep reinforcement learning is the combination of deep neural networks with RL, where the policy is represented by a neural network πθ, where θ denotes the weights of the network.

2.2.2 TD7

A popular family of RL methods is actor-critic algorithms, where a policy known as the actor is updated by the deterministic policy gradient algorithm (Silver et al., 2014) Equation 6:

θJθ=EspπaQπs,a|a=πsθπθs(6)

In Equation 7, Qπ(s,a) is known as the critic or value function, which is used to calculate the expected return when performing action a in a given state s following the actor policy π.

Qπs,a=Esipπ,aiπRt|s,a,(7)

which is commonly updated by temporal difference learning utilizing a secondary target network as described by Mnih et al. (2013) see Equation 8.

y=r+γQθs,a,aπθs,(8)

where Qθ(s,a) is the target critic network, and πθ is the target actor network.

These methods are prone to overestimation errors, whereby, through the function approximation error of the critic, some state-value pairs are overestimated, leading to a sub-optimal policy. The twin delayed deep deterministic policy algorithm (TD3) (Fujimoto et al., 2018) solves function approximation errors by the use of a second critic network and clipped doubled Q-learning (Van Hasselt et al., 2016), as shown in Equation 9.

y=r+γmini=1,2Qθis,πθ1s(9)

In our proposed reinforcement learning-based controller, a state-of-the-art reinforcement learning algorithm called TD7 (Fujimoto et al., 2023), which incorporates the following additions to the TD3 algorithm, is used.

A loss-adjusted prioritized (LAP) (Fujimoto et al., 2020) replay buffer improves the sample efficiency of the algorithm and speeds up training by sampling transition tuples i(s,a,r,s) from which the agent can learn more. The probability of sampling transition i from the replay buffer B sampling is

pi=max|δi|α,1jBmax|δi|α,1,whereδi=Qs,ar+γQs,a(10)

In Equation 10, the level of prioritization is governed by the hyperparameter α.

Behavioral cloning term allows the use of the algorithm in an offline-RL setting Fujimoto and Gu (2021). However, because our task relies on online training, we do not go into depth for this addition.

Policy checkpoints add additional stability toward the training of the agent by selectively employing the best-performing networks, therefore providing stability.

State-action learned embeddings aim to improve the inputs to the actor and critic networks by capturing the relevant underlying structure of the observation space and the transition dynamics present in the environment. Therefore, our network equations can be described by Equation 11 as follows:

Qs,aQzsa,zs,s,a,πsπzs,s,(11)

where zs is the state embedding, and zsa refers to the state-action embedding.

The choice of TD7 (Fujimoto et al., 2023) over other widely used reinforcement learning algorithms such as PPO (Schulman et al., 2017), TD3 (Fujimoto et al., 2018), or SAC (Haarnoja et al., 2018) is motivated by several key factors. First, TD7 exhibits significantly improved sample efficiency, often achieving comparable performance to prior methods with only one-tenth of the training time steps. Second, it demonstrates substantially higher performance across standard gym benchmark tasks (Brockman et al., 2016). Finally, TD7 incorporates embeddings that enable the use of larger neural network architectures. A detailed list of hyperparameters with their justifications is provided in the supplementary material.

2.2.3 Observations, actions, and rewards

At each time step t, an observation vector of ot R80. The observation/state vector is defined by o={Ft2:t,τt2:t,pt1:ta,pt1:tj}, in which F contains the force values of the actuator, τ refers to the tremor torque, pa contain the end position coordinates of the actuators, and pj denotes the coordinate positions of the joints. In this observation vector, all the values are normalized.

For each observation vector, the actor network outputs an action at R7 in the form of the output force of each actuator. These actions are then converted into the ranges of the actuator forces F.

To achieve the complex tremor-suppression behavior of our agent, a densely constructed reward function is utilized. The aim of the control is to suppress tremors to the maximum extent while interfering the least with the voluntary movement of the patient and also utilizing the minimum force required.

Therefore, the reward function consists of five parts: a part accounting for mitigating the tremor torque, a sub-reward accounting for the distortion of the original movement trajectory, a term encouraging tremor reduction across all the affected axes, an actuator smoothness reward, and a reward encouraging the use of minimal actuator force in order to control this tremor torque. This reward is based on the reinforcement learning heuristics of reward shaping (Peng et al., 2018a). The full reward function is written as Equation 12:

rt=warta+wτrtτ+wFrtF+wasrtas+wurtu(12)

where wa, wτ, wF, and was are the respective weights of the sub-rewards. The values of the weights are wa=0.5, wτ=0.9, wF=0.05, was=0.05, and wu=0.5. Fs and Fe denote the maximum actuator forces possible at the shoulder and elbow actuators.

The tremor axis reward ra aims to encourage control strategies that suppress tremors across all the involved joint axes, as defined in Equation 13:

rta=wana(13)

where na is the number of axes where generated tremor torques are present.

The torque reward rtτ enforces the agent to mitigate tremors in all the affected joint axes:

rtτ=expi=1n|τei||τti|/|τti|+1n(14)

Equation 14 contains the unmitigated original tremor-generated torque values τt and the torque values after the exoskeleton has applied its forces τe. The equation calculates the tremor suppression on a given joint axis, which is then averaged to be capable of handling tremors affecting multiple joint axes.

The actuator force reward encourages the agent to apply minimal forces with the exoskeleton actuators, reducing energy expenditure, improving efficiency, and preventing damage to the exoskeleton and the patient.

rtF=expiFiFe+Fs(15)

In Equation 15, wa is an actuator weight aimed to magnify the learning signal, whose value is dependent on the highest maximum force output and the number of actuators present in the exoskeleton.

The action smoothness reward Equation 16 promotes the use of smooth actuator forces by penalizing the second-order derivatives of the actuator forces:

rtas=1NFe+Fs2i=1NFi2Fi1+Fi22(16)

Because exoskeletons can disrupt natural movements, the “unwanted movement” reward Equation 17 is added to ensure smoother, more natural motion, minimizing discomfort and improving efficiency. The reward discourages the control from interfering with the voluntary movement trajectory by penalizing the amount of torque created on non-tremor-affected axes.

rtu=expiτiuFe+Fs2(17)

2.2.4 Modifications to handle tremor suppression

Given the diverse nature and precision demands inherent in the tremor-suppression task, the training algorithm has undergone specific modifications to accommodate these challenges.

First, a modification is made to the replay buffer to handle the variance in movement trajectories present in the reference movement. This way, the replay buffer is divided into as many sub-parts as there are reference movements, and then from these sub-buffers, we sample a batch size number of transitions according to prioritized experience replay (Fujimoto et al., 2020). This modification improves robustness because oversampling is avoided even though the different length reference movements create an uneven data distribution in the buffer overall. The sub-buffer also has an increased size to leverage a wider range of possible transitions present to improve the performance of training (Fedus et al., 2020).

The other main modification is regarding the decrease of action and policy noise in the algorithm. This helps by reducing the random space around the agent’s chosen action/policy values, therefore allowing it to learn more fine-tuned control policies. This is crucial because a small change in actuator force can lead to vast differences in the torque created on the human skeleton due to anatomical reasons.

Tremor suppression via exoskeleton requires sophisticated actuation of different motors, where we found that typical white noise exploration added to the chosen actions is not sufficient. Therefore, we replace this common method by adding pink noise (Eberhard et al., 2023) to the actions, improving the agent’s exploration ability and improving action smoothness by incorporating a more correlated noise to the actions.

2.2.5 Training details

The training of the agent is performed in one set of parallel environments, where each represents a reference movement trajectory and a distinctly generated tremor, with the axes defined where tremors are present. The axes in which the tremors are present are constant throughout all the dynamic movements. The agent does not use random state initialization or early termination, but it ensures that the simulated trajectory remains close to the original trajectory by initializing each simulation step from the original value of the reference movement and not the previous simulation step positions. This ensures robustness and boosts performance.

The specifics of the networks and hyperparameter details of the training are found in the supplementary materials.

3 Results

To analyze our control algorithm’s performance, a number of numerical tests were conducted to answer the following questions: 1) Can the trained agent suppress the generated tremors across various joints and reference movement? 2) How accurately can it mitigate the effects of tremors, and at what percentage? 3) To what extent are the generated tremor torque values suppressed? 4) How is the original movement trajectory affected by the exoskeleton?

3.1 Evaluation of the control policy

The control policy was evaluated through 100 episodes, each of which consisted of an environment with each of the reference movements. The environment characteristics were sampled from a larger dynamics randomization testing range (Table 1) to display the learned controller’s ability to generalize to out-of-distribution cases. The frequency components for the tremors were randomly generated in the specified range in each episode and environment. The control has been trained and evaluated for each possible tremor combination involving the shoulder axes and the elbow extension/flexion axis. The tremor amplitude suppression percentages and the occurrences of tremor suppression without interfering with the original movement trajectory can be seen in Figure 4. The controller effectively suppresses tremors in all but one of the generated tremor pairs, demonstrating a high level of generalizability of the method. In-depth performance data for each of the combinations of the tremor joint axes are presented in the supplementary material.

Table 1
www.frontiersin.org

Table 1. Dynamic randomization parameter ranges used throughout training and validation. Anatomical matrices represent values in the inertia, damping, and stiffness matrices. Actuator precision accounts for the discrepancy between the commanded and actual force generated by the actuator. Actuator end-point shift refers to actuator sliding due to soft-robotic Velcro changes. Tremor frequencies and amplitude reflect different Parkinson’s patients’ tremor characteristics.

Figure 4
www.frontiersin.org

Figure 4. Tremor amplitude suppression values throughout the tremor pairs. Tremor suppression was evaluated for each case over 100 episodes across all reference movements, averaging tremor amplitude suppression and occurrence values. Tremor occurrence indicates the percentage of time steps where the tremor was reduced without disrupting the person’s original trajectory.

To further investigate the reference motion-wise performance of the controller, we evaluate a tremor case involving the elbow flexion/extension axis of the arm. The performance of the control can be seen in Figure 5.

Figure 5
www.frontiersin.org

Figure 5. Tremor amplitude suppression values throughout the movements. We display the tremor amplitude suppression values achieved by the exoskeleton throughout the recorded dynamic movement over the time steps of a single episode. The reported statistics are computed over 100 episodes with out-of-training distribution domain randomization.

These results demonstrate that a single RL-based trained controller can adapt to mitigate tremors regardless of the reference movement. The control also displays high performance, with the maximum tremor amplitude suppression values exceeding 99%. The control also displays a consistent ability to suppress tremors, evident from the median and mean values of Figure 5.

The qualitative performance of the controller can be seen in Figure 6A. The figure shows how the original movement trajectory is affected by the exoskeleton. This figure presents additional evidence, as the suppressed movement trajectory consistently maintains a shorter distance from the reference movement’s trajectory when compared to the trajectory affected by tremors.

The torque plots in Figures 6B–D display the controller-created torque present on the joints unaffected by the simulated tremor. The controller’s ideal behavior, which is derived from the torque values achieving the optimal zero generated torque at given time steps, can be seen in the figures. Furthermore, when the torque values are not 0, we can see a tendency in the time steps to minimize this torque and correct the control behavior.

Figure 6
www.frontiersin.org

Figure 6. The trajectories of the simulation. (A) The trajectories in the 3D simulation environment. Trajectories were recorded during an external rotation movement of the shoulder. Tremors were observed in the flexion/extension axis of the elbow joint. The blue represents the original trajectory, the green represents the trajectory affected by the tremor, and the red represents the exoskeleton-suppressed trajectory. (B -E) The suppressed and unsuppressed torque values present at each joint.

The torque plot in Figure 6E shows the torque created by the controllers present in the joint affected by the simulated tremor. The trained controller effectively suppresses tremors in the involved axis in which tremors are present, although it is prone to producing different torque suppression percentages. The concrete torque suppression values for each tremor pair averaged across the movements are provided in the supplementary material.

4 Discussion and limitations

The control of soft-robotic exoskeletons requires real-time decision making on a wide range of stochastic predictors and changing sensory readings in dynamic everyday movements. Furthermore, validation of these control algorithms requires extensive testing to ensure the safety and performance of the control.

This work contains an easily adaptable testing environment for all sorts of neurodegenerative diseases displaying symptoms of tremors, allowing for rapid and cheap testing for new learning-based control methods.

The controller also displays good performance in mitigating tremor torques and amplitudes. However, this performance fluctuates over time steps. The controller’s performance varies between movements and tremor cases. This is due to the exoskeleton structure and also the simulation tremor torque values used. In future studies, exploration of tremor torque ranges is warranted. Furthermore, addressing performance optimization requires a more nuanced understanding of the involved joint axes and their associated characteristics in a given dynamic movement.

From the tremor torque plots, it is evident that further reduction in the tremor amplitude is dependent on how effectively the exoskeleton only exerts forces onto the axes where tremors are present. This could involve revisiting the actuator positions or perhaps using a mixed method of FES and robotic exoskeletons. This approach could minimize tremors in the shoulder flexion/extension axis, a case that, along with its variants, achieved the lowest tremor suppression torque/amplitude values.

This control uses state-of-the-art approaches introduced by encoder networks to extract information from changing observations induced by tremors to achieve high-performance tremor suppression. The achieved performances also highlight the need for improved neural network architectures in these algorithms to improve the safety and stability of these control methods, which cannot be bypassed by hybrid traditional learning-based control approaches because the neural network actor’s densely connected architecture can generate values vastly different in concurrent time steps. These generated values cannot be mitigated meaningfully by a traditional proportional-integrative-derivative (PID) controller. This is a future challenge to be addressed due to the limitations of the frequency of control actuation usually present (30–40 Hz) in the actuators.

The results show promise, but the current research is limited to simulation. We mitigate this limitation through domain randomization methods as much as possible. Furthermore, as our training algorithm relies on Markov decision process (MDP) formulation, additional considerations must be made to maintain accurate sensor readings by either state estimation techniques such as Kalman filters (Kalman, 1960) or by incorporating an algorithm that can handle partially observable states. Movements that differ significantly from the reference trajectories used in training may limit the controller’s accuracy. Consequently, future work should include experimental validation on patients and testing with out-of-distribution simulation movements to fully understand how this impacts the controller’s performance. For safety reasons, built-in safety checks, torque limits, and acceleration thresholds should be incorporated into the deployed exoskeleton to further mitigate this problem.

Current applications of the exoskeleton control system extend to real-life rehabilitation exercises, similar to the trajectories present during training.

5 Conclusion

This paper proposes a physical simulation-based tremor-suppressing exoskeleton physical simulation framework. The framework is flexible and adaptable to different diseases and characteristics of patients with tremor symptoms. It can also be incorporated with various other dynamic movements. This simulation also proposes an inexpensive and rapid method of validating control algorithm performances. The simulation hyperparameters are included in the supplementary material.

The paper also details the training of a reinforcement learning-based encoder-actor controller. The controller can adapt to personalized interventions in the management of movement disorders. Additionally, the controller can adjust to varying ranges of actuator forces, thereby proposing a viable strategy for tremor suppression.

Experimental results show that the proposed framework can mitigate tremor torques present at the joint axes, and the entire tremor amplitude with tremor propagation is taken into account. The results indicate a substantial decrease in both median and maximum tremor amplitudes.

The control aims to mitigate tremors without interfering with the original movement, not allowing patients to rely too heavily on the exoskeleton during natural motor abilities, thereby not hindering rehabilitation efforts while also minimizing the potential side effects that might arise from prolonged use.

In the future, we intend to deploy the trained exoskeleton control on physical hardware, incorporating sim-to-real techniques into the physical simulation. Furthermore, we aim to validate the performance of this control in a clinical trial setting with patients involved.

Data availability statement

The raw data reference motion recordings in the article will be made available by the authors, without undue reservation. Also the log files of the evaluation process will be made available to ensure the accuracy and transparency of the research. The modified version of the TD7 algorithm is open sourced: (https://github.com/TomasDelaney/A-Deep-Reinforcement-Learning-Enabled-Soft-Exoskeleton-for-Parkinson-s-Patients).

Ethics statement

The study protocol was approved by ETT TUKEB (reference number: IV/8514-3/2021/EKU), and the study was conducted in accordance with the Declaration of Helsinki. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

TE: data curation, investigation, methodology, software, visualization, writing–original draft, and writing–review and editing. SF: conceptualization, project administration, and writing–review and editing. ÁM: conceptualization, project administration, and writing–review and editing. GC: conceptualization, funding acquisition, project administration, supervision, writing–review and editing, and writing–original draft.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. Project no. 2020-1.1.5-GYORSÍTÓSÁV-2021-00022 has been implemented with the support provided by the Ministry of Culture and Innovation of Hungary from the National Research, Development and Innovation Fund, financed under the 2020-1.1.5-GYORSÍTÓSÁV funding scheme. Additional support came from the ÚNKP-23-1-I-PPKE-5 national excellence program and the TKP-2021_02-NVA-27 grant from the National Research, Development and Innovation Office.

Acknowledgments

The authors would like to acknowledge the insight, opportunity, and help provided by the Pető Institute. The authors are grateful to Edward and Martha Kovach for their manuscript writing suggestions.

Conflict of interest

Authors TE, SF and GC were employed by Jedlik Innovation Ltd.

The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that Generative AI was used in the creation of this manuscript. To find catchy title suggestions and for sentence-level grammar editing.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/frobt.2025.1537470/full#supplementary-material

Reference

Albano, L., Basaia, S., Emedoli, D., Balestrino, R., Pompeo, E., Barzaghi, L. R., et al. (2023). Longitudinal brain functional connectivity changes induced by neurosurgical thalamotomy for tremor in Parkinson’s disease: a preliminary study. J. Neurology 270, 3623–3629. doi:10.1007/s00415-023-11705-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Asil, S. M., Ahlawat, J., Barroso, G. G., and Narayan, M. (2020). Nanomaterial based drug delivery systems for the treatment of neurodegenerative diseases. Biomaterials Sci. 8, 4109–4128. doi:10.1039/d0bm00809e

CrossRef Full Text | Google Scholar

Awantha, W., Wanasinghe, A., Kavindya, A., Kulasekera, A., and Chathuranga, D. (2020). “A novel soft glove for hand tremor suppression: evaluation of layer jamming actuator placement,” in 2020 3rd IEEE international conference on soft robotics (RoboSoft) (IEEE), 440–445.

CrossRef Full Text | Google Scholar

Bhatia, K. P., Bain, P., Bajaj, N., Elble, R. J., Hallett, M., Louis, E. D., et al. (2018). Consensus statement on the classification of tremors. from the task force on tremor of the international Parkinson and movement disorder society. Mov. Disord. 33, 75–87. doi:10.1002/mds.27121

PubMed Abstract | CrossRef Full Text | Google Scholar

Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., et al. (2016). Openai gym. arXiv preprint arXiv:1606.01540

Google Scholar

Cai, Z., Liu, M., Zeng, L., Zhao, K., Wang, C., Sun, T., et al. (2023). Role of traditional Chinese medicine in ameliorating mitochondrial dysfunction via non-coding rna signaling: implication in the treatment of neurodegenerative diseases. Front. Pharmacol. 14, 1123188. doi:10.3389/fphar.2023.1123188

PubMed Abstract | CrossRef Full Text | Google Scholar

Corie, T. H., and Charles, S. K. (2019). Simulated tremor propagation in the upper limb: from muscle activity to joint displacement. J. biomechanical Eng. 141, 0810011–08100117. doi:10.1115/1.4043442

PubMed Abstract | CrossRef Full Text | Google Scholar

Coumans, E., and Bai, Y. (2016). Pybullet, a python module for physics simulation for games, robotics and machine learning. Available online at: http://pybullet.org.

Google Scholar

Davidson, A. D., and Charles, S. K. (2017). Fundamental principles of tremor propagation in the upper limb. Ann. Biomed. Eng. 45, 1133–1147. doi:10.1007/s10439-016-1765-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Deuschl, G., Bain, P., Brin, M., and Committee, A. H. S. (1998). Consensus statement of the movement disorder society on tremor. Mov. Disord. 13, 2–23. doi:10.1002/mds.870131303

PubMed Abstract | CrossRef Full Text | Google Scholar

Dideriksen, J. L., Laine, C. M., Dosen, S., Muceli, S., Rocon, E., Pons, J. L., et al. (2017). Electrical stimulation of afferent pathways for the suppression of pathological tremor. Front. Neurosci. 11, 178. doi:10.3389/fnins.2017.00178

PubMed Abstract | CrossRef Full Text | Google Scholar

Ding, Y., Kim, M., Kuindersma, S., and Walsh, C. J. (2018). Human-in-the-loop optimization of hip assistance with a soft exosuit during walking. Sci. robotics 3, eaar5438. doi:10.1126/scirobotics.aar5438

PubMed Abstract | CrossRef Full Text | Google Scholar

Dosen, S., Muceli, S., Dideriksen, J. L., Romero, J. P., Rocon, E., Pons, J., et al. (2014). Online tremor suppression using electromyography and low-level electrical stimulation. IEEE Trans. Neural Syst. Rehabilitation Eng. 23, 385–395. doi:10.1109/tnsre.2014.2328296

PubMed Abstract | CrossRef Full Text | Google Scholar

Eberhard, O., Hollenstein, J., Pinneri, C., and Martius, G. (2023). “Pink noise is all you need: colored noise exploration in deep reinforcement learning,” in The Eleventh International Conference on Learning Representations, China, May 1 — Fri May 5.

Google Scholar

Faraji, B., Rouhollahi, K., Paghaleh, S. M., Gheisarnejad, M., and Khooban, M.-H. (2023). Adaptive multi symptoms control of Parkinson’s disease by deep reinforcement learning. Biomed. Signal Process. Control 80, 104410. doi:10.1016/j.bspc.2022.104410

CrossRef Full Text | Google Scholar

Fedus, W., Ramachandran, P., Agarwal, R., Bengio, Y., Larochelle, H., Rowland, M., et al. (2020). Revisiting fundamentals of experience replay

Google Scholar

Fujimoto, S., Chang, W.-D., Smith, E., Gu, S. S., Precup, D., and Meger, D. (2023). For sale: state-action representation learning for deep reinforcement learning. Advances in neural information processing systems. 36, 61573–61624.

Google Scholar

Fujimoto, S., and Gu, S. S. (2021). A minimalist approach to offline reinforcement learning. Adv. neural Inf. Process. Syst. 34, 20132–20145.

Google Scholar

Fujimoto, S., Hoof, H., and Meger, D. (2018). “Addressing function approximation error in actor-critic methods,” in International conference on machine learning USA, 2025 – Sat, 19 Jul, 2025, (PMLR), 1587–1596.

Google Scholar

Fujimoto, S., Meger, D., and Precup, D. (2020). An equivalence between loss functions and non-uniform sampling in experience replay. Adv. neural Inf. Process. Syst. 33, 14219–14230.

Google Scholar

Gill, T. K., Shanahan, E. M., Tucker, G. R., Buchbinder, R., and Hill, C. L. (2020). Shoulder range of movement in the general population: age and gender stratified normative data using a community-based cohort. BMC Musculoskelet. Disord. 21, 676–679. doi:10.1186/s12891-020-03665-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Günzkofer, F., Bubb, H., and Bengler, K. (2012). Maximum elbow joint torques for digital human models. Int. J. Hum. Factors Model. Simul. 3, 109–132. doi:10.1504/ijhfms.2012.051092

CrossRef Full Text | Google Scholar

Haarnoja, T., Moran, B., Lever, G., Huang, S. H., Tirumala, D., Humplik, J., et al. (2024). Learning agile soccer skills for a bipedal robot with deep reinforcement learning. Science Robotics 9 (89), eadi8022.

PubMed Abstract | CrossRef Full Text | Google Scholar

Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018). “Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International conference on machine learning China, 19 Jul, 2025, (Pmlr), 1861–1870.

Google Scholar

Heemels, M.-T. (2016). Neurodegenerative diseases. Nature 539, 179–180. doi:10.1038/539179a

PubMed Abstract | CrossRef Full Text | Google Scholar

Heris, R. M., Shirvaliloo, M., Abbaspour-Aghdam, S., Hazrati, A., Shariati, A., Youshanlouei, H. R., et al. (2022). The potential use of mesenchymal stem cells and their exosomes in Parkinson’s disease treatment. Stem Cell. Res. and Ther. 13, 371. doi:10.1186/s13287-022-03050-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Herrnstadt, G., and Menon, C. (2016). Voluntary-driven elbow orthosis with speed-controlled tremor suppression. Front. Bioeng. Biotechnol. 4, 29. doi:10.3389/fbioe.2016.00029

PubMed Abstract | CrossRef Full Text | Google Scholar

Jin, Y., Liu, X., Shao, Y., Wang, H., and Yang, W. (2022). High-speed quadrupedal locomotion by imitation-relaxation reinforcement learning. Nat. Mach. Intell. 4, 1198–1208. doi:10.1038/s42256-022-00576-3

CrossRef Full Text | Google Scholar

Jitkritsadakul, O., Thanawattano, C., Anan, C., and Bhidayasiri, R. (2017). Tremor’s glove-an innovative electrical muscle stimulation therapy for intractable tremor in Parkinson’s disease: a randomized sham-controlled trial. J. Neurological Sci. 381, 331–340. doi:10.1016/j.jns.2017.08.3246

PubMed Abstract | CrossRef Full Text | Google Scholar

Kalman, R. E. (1960). A new approach to linear filtering and prediction problems, J. Basic Eng, 82, 35, 45. doi:10.1115/1.3662552

CrossRef Full Text | Google Scholar

Kaufmann, E., Bauersfeld, L., Loquercio, A., Müller, M., Koltun, V., and Scaramuzza, D. (2023). Champion-level drone racing using deep reinforcement learning. Nature 620, 982–987. doi:10.1038/s41586-023-06419-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Ketteringham, L. P., Western, D. G., Neild, S. A., Hyde, R. A., Jones, R. J., and Davies-Smith, A. M. (2014). Inverse dynamics modelling of upper-limb tremor, with cross-correlation analysis. Healthc. Technol. Lett. 1, 59–63. doi:10.1049/htl.2013.0030

PubMed Abstract | CrossRef Full Text | Google Scholar

Lamptey, R. N., Chaulagain, B., Trivedi, R., Gothwal, A., Layek, B., and Singh, J. (2022). A review of the common neurodegenerative disorders: current therapeutic approaches and the potential role of nanotherapeutics. Int. J. Mol. Sci. 23, 1851. doi:10.3390/ijms23031851

PubMed Abstract | CrossRef Full Text | Google Scholar

Lang, A. E., and Lozano, A. M. (1998a). Medical progress: Parkinson’s disease (first of two parts). N. Engl. J. Med. 339, 1044–1053. doi:10.1056/nejm199810083391506

PubMed Abstract | CrossRef Full Text | Google Scholar

Lang, A. E., and Lozano, A. M. (1998b). Parkinson’s disease. N. Engl. J. Med. 339, 1130–1143. doi:10.1056/nejm199810153391607

PubMed Abstract | CrossRef Full Text | Google Scholar

Lora-Millan, J. S., Delgado-Oleas, G., Benito-León, J., and Rocon, E. (2021). A review on wearable technologies for tremor suppression. Front. neurology 12, 700600. doi:10.3389/fneur.2021.700600

PubMed Abstract | CrossRef Full Text | Google Scholar

Louis, E. D., Marder, K., Cote, L., Pullman, S., Ford, B., Wilder, D., et al. (1995). Differences in the prevalence of essential tremor among elderly african Americans, whites, and hispanics in northern manhattan, NY. Archives Neurology 52, 1201–1205. doi:10.1001/archneur.1995.00540360079019

PubMed Abstract | CrossRef Full Text | Google Scholar

Luo, S., Androwis, G., Adamovich, S., Nunez, E., Su, H., and Zhou, X. (2023). Robust walking control of a lower limb rehabilitation exoskeleton coupled with a musculoskeletal model via deep reinforcement learning. J. neuroengineering rehabilitation 20, 34–19. doi:10.1186/s12984-023-01147-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Luo, S., Androwis, G., Adamovich, S., Su, H., Nunez, E., and Zhou, X. (2021). Reinforcement learning and control of a lower extremity exoskeleton for squat assistance. Front. Robotics AI 8, 702845. doi:10.3389/frobt.2021.702845

PubMed Abstract | CrossRef Full Text | Google Scholar

Luo, S., Jiang, M., Zhang, S., Zhu, J., Yu, S., Dominguez Silva, I., et al. (2024). Experiment-free exoskeleton assistance via learning in simulation. Nature 630, 353–359. doi:10.1038/s41586-024-07382-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., et al. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602

Google Scholar

Oliveira, A. M., Coelho, L., Carvalho, E., Ferreira-Pinto, M. J., Vaz, R., and Aguiar, P. (2023). Machine learning for adaptive deep brain stimulation in Parkinson’s disease: closing the loop. J. Neurology 270, 5313–5326. doi:10.1007/s00415-023-11873-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Otis, J. C., Warren, R. F., Backus, S. I., Santner, T. J., and Mabrey, J. D. (1990). Torque production in the shoulder of the normal young adult male: the interaction of function, dominance, joint angle, and angular velocity. Am. J. sports Med. 18, 119–123. doi:10.1177/036354659001800201

PubMed Abstract | CrossRef Full Text | Google Scholar

Pascual-Valdunciel, A., González-Sánchez, M., Muceli, S., Adán-Barrientos, B., Escobar-Segura, V., Pérez-Sánchez, J. R., et al. (2020). Intramuscular stimulation of muscle afferents attains prolonged tremor reduction in essential tremor patients. IEEE Trans. Biomed. Eng. 68, 1768–1776. doi:10.1109/tbme.2020.3015572

PubMed Abstract | CrossRef Full Text | Google Scholar

Peng, X. B., Abbeel, P., Levine, S., and Van de Panne, M. (2018a). Deepmimic: example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. (TOG) 37, 1–14. doi:10.1145/3197517.3201311

CrossRef Full Text | Google Scholar

Peng, X. B., Andrychowicz, M., Zaremba, W., and Abbeel, P. (2018b). “Sim-to-real transfer of robotic control with dynamics randomization,” in 2018 IEEE International Conference on Robotics and Automation (ICRA) (IEEE), USA, May 1 — Fri May 5, 3803–3810. doi:10.1109/icra.2018.8460528

CrossRef Full Text | Google Scholar

Plagenhoef, S., Evans, F. G., and Abdelnour, T. (1983). Anatomical data for analyzing human motion. Res. Q. Exerc. sport 54, 169–178. doi:10.1080/02701367.1983.10605290

CrossRef Full Text | Google Scholar

Riviere, C. N., Reich, S. G., and Thakor, N. V. (1997). Adaptive fourier modeling for quantification of tremor. J. Neurosci. methods 74, 77–87. doi:10.1016/s0165-0270(97)02263-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Rocon, E., Belda-Lois, J. M., Ruiz, A., Manto, M., Moreno, J. C., and Pons, J. L. (2007a). Design and validation of a rehabilitation robotic exoskeleton for tremor assessment and suppression. IEEE Trans. neural Syst. rehabilitation Eng. 15, 367–378. doi:10.1109/tnsre.2007.903917

PubMed Abstract | CrossRef Full Text | Google Scholar

Rocon, E., Manto, M., Pons, J., Camut, S., and Belda, J. M. (2007b). Mechanical suppression of essential tremor. Cerebellum 6, 73–78. doi:10.1080/14734220601103037

PubMed Abstract | CrossRef Full Text | Google Scholar

Sadeghi, F., and Levine, S. (2016). Cad2rl: real single-image flight without a single real image. arXiv preprint arXiv:1611.04201.

Google Scholar

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347

Google Scholar

Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. (2014). “Deterministic policy gradient algorithms,” in International conference on machine learning USA, 19 Jul, 2025, (Pmlr), 387–395.

Google Scholar

Siviy, C., Baker, L. M., Quinlivan, B. T., Porciuncula, F., Swaminathan, K., Awad, L. N., et al. (2023). Opportunities and challenges in the development of exoskeletons for locomotor assistance. Nat. Biomed. Eng. 7, 456–472. doi:10.1038/s41551-022-00984-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Skaramagkas, V., Andrikopoulos, G., and Manesis, S. (2020). “An experimental investigation of essential hand tremor suppression via a soft exoskeletal glove,” in 2020 European control conference (ECC) China, 12 May 2020, (IEEE), 889–894.

CrossRef Full Text | Google Scholar

Slade, P., Kochenderfer, M. J., Delp, S. L., and Collins, S. H. (2022). Personalizing exoskeleton assistance while walking in the real world. Nature 610, 277–282. doi:10.1038/s41586-022-05191-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Sutton, R. S., and Barto, A. G. (2018). Reinforcement learning: an introduction. China, MIT press.

Google Scholar

Taheri, B., Case, D., and Richer, E. (2013). Robust controller for tremor suppression at musculoskeletal level in human wrist. IEEE Trans. neural Syst. rehabilitation Eng. 22, 379–388. doi:10.1109/tnsre.2013.2295034

PubMed Abstract | CrossRef Full Text | Google Scholar

Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., and Abbeel, P. (2017). Domain randomization for transferring deep neural networks from simulation to the real world, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 23, 30. doi:10.1109/iros.2017.8202133

CrossRef Full Text | Google Scholar

Van Hasselt, H., Guez, A., and Silver, D. (2016). Deep reinforcement learning with double q-learning. Proc. AAAI Conf. Artif. Intell. 30. doi:10.1609/aaai.v30i1.10295

CrossRef Full Text | Google Scholar

Wang, G., Wang, H., Gao, W., Yang, X., and Wang, Y. (2023). Jamming enabled variable stiffness wrist exoskeleton for tremor suppression. IEEE Robotics Automation Lett. 8, 3693–3700. doi:10.1109/lra.2023.3270747

CrossRef Full Text | Google Scholar

Welch, G., and Bishop, G. (1995). An introduction to the kalman filter. Chapel Hill, NC, USA: University of North Carolina at Chapel Hill.

Google Scholar

Xie, M., She, J., Liu, Z.-T., Yang, Z., and Sato, D. (2024). A tremor-suppressing strategy based on the equivalent-input-disturbance approach. IEEE/ASME Trans. Mechatronics 29, 3971–3980. doi:10.1109/tmech.2024.3375911

CrossRef Full Text | Google Scholar

Yi, A., Zahedi, A., Wang, Y., Tan, U.-X., and Zhang, D. (2019). “A novel exoskeleton system based on magnetorheological fluid for tremor suppression of wrist joints,” in 2019 IEEE 16th international conference on rehabilitation robotics (ICORR) USA, 24-28 June 2019, (IEEE), 1115–1120.

CrossRef Full Text | Google Scholar

Zahedi, A., Zhang, B., Yi, A., and Zhang, D. (2021). A soft exoskeleton for tremor suppression equipped with flexible semiactive actuator. Soft Robot. 8, 432–447. doi:10.1089/soro.2019.0194

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhou, Y., Naish, M. D., Jenkins, M. E., and Trejos, A. L. (2017). Design and validation of a novel mechatronic transmission system for a wearable tremor suppression device. Robotics Aut. Syst. 91, 38–48. doi:10.1016/j.robot.2016.12.009

CrossRef Full Text | Google Scholar

Zwerus, E. L., Willigenburg, N. W., Scholtes, V. A., Somford, M. P., Eygendaal, D., and van den Bekerom, M. P. (2019). Normative values and affecting factors for the elbow range of motion. Shoulder and Elb. 11, 215–224. doi:10.1177/1758573217728711

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: deep reinforcement learning, soft exoskeleton, Parkinson’s disease, tremor, physics simulation, human–robot interaction

Citation: Endrei T, Földi S, Makk Á and Cserey G (2025) Learning to suppress tremors: a deep reinforcement learning-enabled soft exoskeleton for Parkinson’s patients. Front. Robot. AI 12:1537470. doi: 10.3389/frobt.2025.1537470

Received: 30 November 2024; Accepted: 25 March 2025;
Published: 21 May 2025.

Edited by:

Alessandro Filippeschi, Sant’Anna School of Advanced Studies, Italy

Reviewed by:

Ali Foroutannia, University of Canberra, Australia
Yali Liu, Beijing Institute of Technology, China

Copyright © 2025 Endrei, Földi, Makk and Cserey. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Tamás Endrei, ZW5kcmVpLnRhbWFzQGl0ay5wcGtlLmh1György Cserey, Y3NlcmV5Lmd5b3JneUBpdGsucHBrZS5odQ==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.