Multi-layer CNN-LSTM network with self-attention mechanism for robust estimation of nonlinear uncertain systems

Introduction With the help of robot technology, intelligent rehabilitation of patients with lower limb motor dysfunction caused by stroke can be realized. A key factor constraining the clinical application of rehabilitation robots is how to realize pattern recognition of human movement intentions by using the surface electromyography (sEMG) sensors to ensure unhindered human-robot interaction. Methods A multilayer CNN-LSTM prediction network incorporating the self-attention mechanism (SAM) is proposed, in this paper, which can extract and learn the periodic and trend characteristics of the sEMG signals, and realize the accurate autoregressive prediction of the human motion information. Firstly, the multilayer CNN-LSTM network utilizes the CNN layer for initial feature extraction of data, and the LSTM network is used to improve the enhancement of the historical time-series features. Then, the SAM is used to improve the global feature extraction performance and parallel computation speed of the network. Results In comparison with existing test is carried out using actual data from five healthy subjects as well as a clinical hemiplegic patient to verify the superiority and practicality of the proposed algorithm. The results show that most of the model’s prediction R > 0.9 for different motion states of healthy subjects; in the experiments oriented to the motion characteristics of patient subjects, the angle prediction results of R > 0.99 for the untrained data on the affected side, which proves that our proposed model also has a better effect on the angle prediction of the affected side. Discussion The main contribution of this paper is to realize continuous motion estimation of ankle joint for healthy and hemiplegic individuals under non-ideal conditions (weak sEMG signals, muscle fatigue, high muscle tension, etc.), which improves the pattern recognition accuracy and robustness of the sEMG sensor-based system.

Introduction: With the help of robot technology, intelligent rehabilitation of patients with lower limb motor dysfunction caused by stroke can be realized.
A key factor constraining the clinical application of rehabilitation robots is how to realize pattern recognition of human movement intentions by using the surface electromyography (sEMG) sensors to ensure unhindered human-robot interaction.
Methods: A multilayer CNN-LSTM prediction network incorporating the selfattention mechanism (SAM) is proposed, in this paper, which can extract and learn the periodic and trend characteristics of the sEMG signals, and realize the accurate autoregressive prediction of the human motion information.Firstly, the multilayer CNN-LSTM network utilizes the CNN layer for initial feature extraction of data, and the LSTM network is used to improve the enhancement of the historical time-series features.Then, the SAM is used to improve the global feature extraction performance and parallel computation speed of the network.
Results: In comparison with existing test is carried out using actual data from five healthy subjects as well as a clinical hemiplegic patient to verify the superiority and practicality of the proposed algorithm.The results show that most of the model's prediction R > 0.9 for different motion states of healthy subjects; in the experiments oriented to the motion characteristics of patient subjects, the angle prediction results of R > 0.99 for the untrained data on the affected side, which proves that our proposed model also has a better effect on the angle prediction of the affected side.
Discussion: The main contribution of this paper is to realize continuous motion estimation of ankle joint for healthy and hemiplegic individuals under non-ideal conditions (weak sEMG signals, muscle fatigue, high muscle tension, etc.), which improves the pattern recognition accuracy and robustness of the sEMG sensorbased system.

Introduction
Stroke and other diseases may lead to lower limb motor dysfunction in patients.With the assistance of robotic technology, intelligent rehabilitation therapy can be realized to reduce the workload of clinical medical staff and improve the efficiency of patients' rehabilitation training (Kapelner et al., 2020).In the human-machine interaction between rehabilitation robots and patients, traditional human-machine interaction techniques often involve the robot passively receiving instructions, which may not be convenient for patients with motor function impairments (Zhai et al., 2017;Zhang et al., 2022).In recent years, human-machine interaction technology needs to evolve toward allowing robots to actively understand human behavioral intentions, resulting in a new type of interaction based on human biological signals.
Human bioelectric signal is the potential difference activated when the nerve signal containing human behavioral information is transmitted to the relevant organs or tissues, which is a direct reflection of human behavioral intentions (Ma et al., 2021).It is of great significance to break the human-machine barrier and realize natural human-machine interaction by decoding human bioelectric signals to recognize human behaviors, and empowering robots to understand the human body's intentions as an information medium for interaction between human beings and the outside world (Qi et al., 2020).Currently, widely studied bioelectric signals include electromyogram (EMG), electroencephalogram (EEG), electrocardiogram (ECG), and electrooculography (EOG).We focus on the surface electromyography (sEMG), which originates from the bioelectrical activity of spinal motor neurons under the control of the motor cortex of the brain, and are the temporal and spatial sum of sequences of action units produced by peripherally active motor units.Since sEMG has the advantages of being non-invasive, and simple to use, it is more suitable to be applied to the design of human-machine interaction control systems for rehabilitation robots (Xiong et al., 2021).The core technology to build the EMG human-machine interaction system is to decode the human body's motion intention through EMG signals, and the usually discussed motion intention decoding includes two categories, one is to recognize the discrete limb movements based on sEMG, such as the movements of the hand's clenched fist, extended palm, etc., and the other is to estimate the continuous joint motions based on sEMG, such as the continuous quantities of the joint moments and the joint angles, etc.In this study, we focus on healthy people and hemiplegic patients, and carry out research on sEMG-based continuous motion estimation methods for the foot and ankle area of the lower limb, which lays the foundation for future natural human-machine interaction control.
Human walking characteristics are crucial in studies targeting the continuous movement of the lower limb.Many features of the musculoskeletal system of the lower limbs implied in the human walking information.Human walking information can be used as a basis for the recognition of human movement intentions and the estimation and prediction of the human body's movements, which in turn improves the stability and accuracy of human-computer interactions with external devices, such as exoskeletons.It is also possible to compare the gait characteristics of different walking bodies, especially between healthy and patients.This enables an intelligent online evaluation of patient rehabilitation effects, such as stroke rehabilitation.Lower limb walking in healthy people is cyclic, and the inherent states of its musculoskeletal system, such as human limb properties and muscle activation states, are also relatively stable and have good model interpretability, so mechanistic models have been used to describe them in many studies (Zhang L. et al., 2021).There are also some research works that describe machine learning models such as neural networks with straightforward modeling process and unrestricted utilization of sEMG.
However, in research focused on hemiplegic patients, there are large differences in the nature of the bilateral cyclic reciprocity, with the healthy side usually experiencing weak functional decline and the affected side experiencing more severe fluctuations in cyclic information (Aymard et al., 2000;Zhao et al., 2023).The alternation of useful and useless information can lead to problems such as gradient disappearance or gradient explosion, causing loss of information (Meng et al., 2023).In addition, these weakly abled people are also prone to problems such as muscle fatigue or even spasticity, and in some cases excessive muscle tone (Zhang et al., 2019;Moniri et al., 2021), all of which will lead to a high degree of difficulty in estimating the continuity of a patient's lower extremities based on EMG signals (Sarasola-Sanz et al., 2018;Fleming et al., 2021;Zhu et al., 2022).
In machine learning network architectures for the study of continuous lower limb motion, auto-regression is a widely used method for time series prediction.It can capture the correlation and dependency of input and output sequences well, and has the advantages of simple structure, flexible order selection and easy application (Lehtokangas et al., 1996).The observations at the current time of the time series data are correlated with the historical observations.Autoregressive technologies can make use of cyclical, trend and seasonal characteristics of historical data to predict future data (Yin et al., 2023).The combination of autoregressive techniques and neural networks can effectively improve the ability of learning, understanding and forecasting of time series data (Taskaya-Temizel and Casey, 2005).A nonlinear autoregressive neural network with exogenous inputs has been proposed to model the dynamic behavior of an automotive air conditioning system (Ng et al., 2014).Combing autoregressive integrated moving average (ARIMA) and probabilistic neural network (PNN), a hybrid network model has been proposed in order to improve the prediction accuracy of ARIMA models (Khashei et al., 2012).Therefore, this article will process the sampled motion data by autoregressive technology, so that the network can fully learn the hidden features and improve the learning efficiency of the network.
In order to improve the robustness of time series signal prediction, a convolutional neural network (CNN) can be used to extract initial features from the data (Shao et al., 2024).The CNN is a specific type of feedforward neural network with a grid topology (Li Z. et al., 2021).CNN uses sparse interaction, parameter sharing and variant representation techniques to improve the feature extraction performance of convolutional operations (Li et al., 2016).Each convolution layer of CNN contains multiple convolution kernels, and each convolution checks data for sliding convolution to achieve feature extraction of time series data to obtain local features and short-term dependencies.The pooling layer performs summary statistics on the output obtained by the convolution layer (Gu et al., 2018).The local perception and weight sharing of CNN can also effectively reduce the number of weight parameters for model learning, thus improving the efficiency of model learning.Liu et al. 10.3389/fnins.2024.1379495Based on deep CNN, a joint multi-task learning algorithm has been developed to predict effectively attributes in images (Abdulnabi et al., 2015).A joint classification-and-prediction framework has been proposed based on CNN for automatic sleep staging (Phan et al., 2018).Combing CNN architecture with depth wise separable convolutions with kernels (CNN-DSCK) has developed for prediction rating exploiting product review (Khan and Niu, 2021).The prediction applications of these complex systems show the advantages of CNN networks in time series feature extraction.For complex and long-term dynamic systems, whose data series have long-term correlation, LSTM network with better long-term feature capture ability can be considered for feature extraction (Bi et al., 2021;Zhang N. et al., 2021;Zha et al., 2022).LSTM network is an improvement of recurrent neural network (RNN) network, which can effectively improve the gradient disappearance and gradient explosion of RNN network in time series prediction (Kim and Cho, 2019).Complex system prediction based on LSTM network has achieved a series of innovative results (Rathore and Harsha, 2022).Based on multi-layer LSTM networks, a forecasting method with a strong capability has been proposed for predicting highly fluctuating demand (Abbasimehr et al., 2020).According to the characteristics of chemical process data, a key alarm variables prediction model has been developed in chemical process based on dynamic-inner principal component analysis (DiPCA) and LSTM network (Bai et al., 2023).Adding self-attention mechanism after LSTM network can further capture the correlation between features directly from a global perspective (Zhang et al., 2020).Increasing attention mechanisms can also compensate for gradient disappearance or gradient explosion problems that LSTM networks face, which can lead to loss of information in time series data (Li J. et al., 2021).By integrating CNN, attention mechanism and LSTM, it is expected to build a network with better predictive performance.
Therefore, this article proposes a robust multi-layer network with excellent performance by integrating LSTM network with CNN network and adding self-attention mechanism technology.In order to extract and learn the period and trend characteristics of EMG signals, autoregressive processing is performed on the collected data.The CNN layer is used to extract the features from the EMG signal.The LSTM network is used to consolidate and enhance the historical temporal features.self-attention mechanism (SAM) is utilized to improve the global feature extraction performance and the parallel computing speed of the network.Finally, compared with the existing algorithm, the superiority and practicability of the proposed network are verified by using the data of healthy laboratory subjects and clinical patients with hemiplegia.
The main contributions of this article are as follows: (1) To address the periodicity of human lower limb gait walking, a multi-layer machine learning network architecture has been designed.It improves the interpretability and prediction accuracy of the auto-regression model, and reduces the problems of gradient disappearance or explosion caused by redundant sensor information.(2) The practicality of the algorithm has been validated, undergoing testing not only on healthy individuals but also utilizing data from hemiplegic patients.It has successfully achieved continuous lower limb motion estimation under nonideal conditions (weak sEMG signals, muscle fatigue, high muscle tension, etc.).This ensures both accuracy and robustness in identification, laying a foundation for the design of humanmachine interaction methods for future rehabilitation robots.
In order to facilitate understanding, the chapter part of this article is summarized as: a novel artificial intelligence algorithm is proposed in section "2 Materials and methods, " the experiments and results are presented in section "3 Experiments and results, " and finally main key conclusions of this article are given in section "4 Discussion and conclusion." 2 Materials and methods

Data acquisition and processing
Five subjects (age: 26.6 ± 2.6 years, height: 1.74 ± 0.08 m, weight: 69 ± 10.9 kg) and one patient tester (male, 67 years old, Brunnstrom stage IV) participated in the data collection of this experiment.The sEMG signal acquisition equipment is a Noraxon Ultium EMG system and AgCl electrodes, as shown in Figure 1.Alcohol wipes are used to wipe the surface skin of the tested muscles to remove impurities such as dead skin and sweat adhering to the skin surface.Two electrodes for each channel are spaced 20 mm apart and affixed to the muscle belly along the muscle fiber direction of the target muscles of both legs of the subjects (Hermens et al., 2000).Subjects walk on a treadmill at 2.0 km/h, 3 km/h, and 5.0 km/h and EMG signals are collected.Subjects walk for 3 min at a time with a 1-min rest between each trial to avoid the effects of muscle fatigue.The sEMG sampling frequency is 1,200 Hz, as shown in Figure 2, three muscles of the ankle joint, tibialis anterior, peroneus longus, and gastrocnemius are collected.Meanwhile, the kinematic parameters are collected using a Noraxon myoMOTION Inertial Measurement Unit (IMU), which collects the angular changes in the sagittal plane of the ankle joint of the lower limb, with a sampling frequency of 200 Hz.Written informed consent was signed by all subjects before inclusion in this study.The experimental procedures follow the Declaration of Helsinki and were approved by the Ethics Committee of Liaoning Provincial People's Hospital (Grant No. 2022HS007).
In this experiment, the ankle EMG input and output signals of five healthy individuals and one stroke patient are used as test and validation signals for the network model.Four sets of test data with a length of 120,000 are obtained from the left foot of the healthy tester under four states: 2 km/h speed, 3 km/h speed, 5 km/h speed, and plantarflexion dorsiflexion maneuver.Two tests are conducted on the left foot of the stroke patient and the length of the sampled data is taken as 70,000.Due to the large amount of noise signals in the original acquired sEMG signals, and the frequency range of sEMG signals are in the range of 0-500 Hz.In this article, the original sEMG signals are filtered and denoised, and then the irrelevant noises are removed from the original sEMG signals in order to retain the valuable information as much as possible.The sEMG signals are first band-pass filtered with a fourth-order 10-500 Hz Butterworth band-pass filter.Then, a 50 Hz trap filter is used to eliminate the industrial frequency interference.After that, the data used will be normalized.Setup of the EMG signal acquisition experiment.

CNN-LSTM networks with self-attention 2.2.1 Convolutional neural network
For the collected data, CNN uses convolution layer to convolve the input vector matrix to extract the local features of the time series data.The feature sequence generation equation is shown in Equation 1.
where W h is the weight matrix of the convolution kernel; b is biased unit; X i:i+h−1 is the sequence matrix from i to i + h − 1 in a time series; h is the size of the convolution kernel; f is the activation function.
The calculated feature set C n can be expressed as Equation 2.
The pooling layer extracts the features of the time series obtained by the convolution layer, outputs a matrix of fixed size, reduces the dimension of the output result and retains the features.In this article, the maximum pooling method is used to calculate the pooling layer.The computational equation of the eigenvector after the pooling of convolution nuclei is represented by Equation 3.

LSTM neural network
LSTM network is a variant of RNN.The key point of LSTM is to control the flow and forgetting of information through the use of structures called gates.The function of these gates is to selectively allow information to pass through or prevent the flow of information, and the core unit is the cell state, which can be regarded as the network's memory.The LSTM network consists of several key components.
1. Cell state: it is the main storage unit of LSTM and is responsible for storing and transmitting information.2. Input gate: the input gate determines whether new information is added to the status unit at the current time step.
The principle is to combine the current input X t and the hidden state  Each value of the sequence A maps to three different spaces.For each input a i , multiply by three trainable weights w q , w k , and w v , respectively, to obtain three values of q i , k i , and v i , namely query, key, and value as shown in Equations 10-12.

Self-attention mechanism
Using the weight matrices, W q , W k , and W v , they can be further expressed in the following matrix form, as shown in Equations 13-15.
Generation of Q, K and V matrices.Generation of and .

FIGURE 5
Generation of output matrix O.
The generation diagram of matrix Q, K, and V is shown in Figure 3.
With each input value a i (i = 1, ...n) corresponding to q i , and all input values a j corresponding to k j , calculate the degree of correlation between a i and a j by dot product, as shown in Equation 16.
Its matrix form is shown in Equation 17.
Dividing δ i,j by the dimension d k of q i or k i can control the size of the dot product result to prevent situations where the gradient is too large or too small and leads to poor training results, as shown in Equation 18.
Its matrix form is shown in Equation 19.
The activated correlation matrix can be obtained by softmax operation on the correlation matrix .
The calculation process is shown in Figure 4.
Use the resulting and V to calculate the attention corresponding to each input vector a i as shown in Equation 20.
Its matrix form can be expressed as Equation 21.
where B is the matrix of attention b i .The computational equation of the self-attention mechanism can be summarized as Equation 22.
The calculation process of attention b 1 for the first input value a 1 is shown in Figure 5.

CNN-LSTM network
The proposed CNN-LSTM prediction model integrated with self-attention mechanism in this article is shown in Figure 6.The predictive network model mainly includes data autoregressive processing, preliminary feature extraction layer based on CNN network, depth feature extraction layer based on LSTM network, and full connection layer.Note that in practical engineering applications, the collected data should be cleaned reasonably, including removing singular values, averaging and noise elimination which can effectively improve the training and testing effect of the network.The time step of autoregression cannot be taken too long or too short.If the time step is taken too long, the less relevant time series information in the past may be added to the current information prediction, which may reduce the prediction accuracy.If the time step is taken too short, it may reduce the correlation extraction between continuous data.Therefore, in the practical application process, the regression time step should be selected according to the specific research object and sequence characteristics.When the network is used for online prediction or control, too many network layers may improve the prediction accuracy of the algorithm, but it may also increase the computing burden of the network.

Autoregressive processing
Surface electromyography has typical nonlinear and fast timevarying characteristics, and it is difficult to capture and extract the trend of sEMG by conventional fitting methods.In order to improve the periodicity, trend and seasonality of the output data, autoregressive processing should be carried out on the pretraining data.

Preliminary feature extraction layer
In order to make the input data after autoregressive processing easier to train, the batch normalization layer (BN layer) is used to normalize each batch.The normalized operation of the BN layer can not only improve the convergence speed of the network model, but also enhance the correlation degree between the data in the batch, and prevent the model from overtraining some data and resulting in overfitting.Then, the features of the time series are initially extracted by using CNN.The CNN can not only extract features through the convolution operation of multiple convolution kernels, but also obtain local dependencies of sequence data by convolution operation with sliding window.For the obtained features, the ReLU activation layer can be activated to enhance the expression ability of the features.

Depth feature extraction layer
The extracted features can be used to further extract the longterm dependencies in the time series data through the LSTM network.The data processed by the LSTM layer enters the Sigmoid layer for activation.Then, the self-attention mechanism is added to calculate the correlation between all features and the weight matrix, and the weight matrix is constantly trained, so that the model can allocate attention independently according to the data characteristics, and improve the role of features in prediction.Finally, the ReLU activation layer is used to activate the features.After the above two deep feature extraction, the obtained deep features are fed into the LSTM layer for comprehensive strengthening and consolidation.

Fully connected layer
The features obtained from the depth feature extraction layer are mapped to the fully connected layer to obtain the prediction results.In order to make the model have stronger generalization  Comparison of the fitting results of the proposed algorithm for healthy subject 1 with the experimental results of the existing algorithm.
ability and avoid the problem of gradient vanishing or gradient explosion, the network proposed in this article adopts the strategy of gradually decreasing the number of neurons, and uses two linear mapping layers in the fully connected layer to continuously reduce the number of neurons, and obtains the single-valued prediction result.In addition, adding the intermediate mapping layer can also enable the model to learn more feature combinations and representations.
3 Experiments and results

Performance evaluation
The R 2 score and Root Mean Square Error (RMSE) are commonly used as evaluation metrics of regression performance for continuous estimation of joint angles (Zhong et al., 2022).In order to obtain more accurate continuous estimation results  for the lower extremity joints, the regression performance of the lower extremity hip, knee, and ankle joints is evaluated using the following R 2 performance metrics.R 2 and RMSE are defined as shown in Equations 23, 24, respectively: where θ i is the actual value of the angle of the target joint, θ i is the angle of the joint predicted by the model, θ is the average value of the actual angle θ i , and n is the length of the sampling sequence.In addition, we perform a statistical analysis using one-way analysis of variance (ANOVA) under the 0.05 level of significance.

Result and discussion
Since each healthy person has 4 test states, 5 healthy people contain 20 sets of test data.The patients contain 2 sets of test data.Considering that the output signals are characterized by obvious periodicity, trend and seasonality, autoregressive processing is performed on the test data in order to highlight the characteristics of the output data for the training of the proposed network model.According to the length of the data sequence, the data of the first 4 healthy subjects, which is the first 90,000 points of the 16 sets of data with the first 50,000 points of the first set of data of the patients are taken respectively.The time step of autoregression is chosen as 5, which is the input and output data at the moment of t-1, t-2,......t-5 and the input data at the moment of t are used simultaneously for the prediction of the output data at the moment of t.In the experimental process, for the proposed network, one layer CNN is set for initial feature extraction.A two-layer LSTM network with SAM followed by one-layer LSTM is used for depth feature extraction.Three full connection layers is applied to obtain the predicted output.Adam is used as an optimizer to determine the optimal solution of the loss function.The parameter setting strategy of the network not only ensures that the network captures data characteristics efficiently, but also does not have too much computational burden.Since the data input dimension is 3 and the output dimension is 1, the data before combination is a vector of 6 rows and 4 columns.The data after regression combination is transformed into a vector of 1 row and 24 columns, where the data in the first 23 columns are treated as input data to the model, and the data in the last 1 column is the output data corresponding to the current t moment.Combined with the time dimension, the length of the time series is 1,490,000.The first 23 columns of data are input into the model to obtain the predicted value y of the model, and the loss value is calculated for the predicted value and the 24th column of output data in order to update the parameters of the network model and complete the learning and training of the model.Using the trained model, predictions are made for the posterior 30,000 points of data for the first 4 health testers, and for the posterior 20,000 points of data for patient group 1.The test results for the four groups of health testers are shown in Figure 7.In order to highlight the superiority, the existing prediction LSTM algorithm in Dong et al. (2023) and the algorithm in Zhou et al. (2022) are also tested for prediction healthy subject 1 in Figure 8.Comparison and verification results show that the proposed prediction algorithm has better prediction accuracy and robustness.The results for patient group 1 are shown in Figure 9.The results show that the proposed method has good tracking performance for lower limb motion angle prediction.
The results of the MSE comparison between the predicted data and the real data for the different exercise modes are shown in Histogram of the coefficient of determination of projected versus real data.
walking condition are small and within 1 • for both healthy and diseased subjects.The static metatarsal dorsiflexion exercise had an MSE within 5 • due to the wider angular range of the exercise.R 2 can be used as a better criterion for use in this study as the error range of MSE fluctuates due to the different angular ranges of motion of different subjects in different modes of motion.

Effects of different exercise modes
The ankle angle prediction curves for the four movement speeds of the testers are shown in Figures 7, 9.Not only the tracking deviation between the estimated angle and the target angle of healthy subjects is small in the CNN-LSTM model, but the model also shows good regression performance for the ankle motion of diseased testers.It can be seen that our proposed multilayer CNN-LSTM network model incorporating the self-attention mechanism has good tracking performance and high prediction accuracy.
Two indicators, RMSE and R 2 are used to evaluate the quality of the method in predicting the ankle joint angle.Comparison of the coefficient of determination between the predicted data and the real data under different motion modes is shown in Figures 10, 11, and it can be seen that most of the coefficients of determination are above 0.99, which means that the predicted data of the model have a better correlation with the real data, and they can reflect the dynamic characteristics of the system very well.It should be noted that the movement of the tester at 2 km/h belong to slow movement.Due to the difference in height and weight, in this case, the lower limbs of the test subjects cannot be fully moved, resulting in different EMG signals for each test subject.The test results in Figure 7 also show that the prediction error is larger under the motion state of 2 km/h.Therefore, the error band for 2 km/h Comparison of the coefficient of determination between predicted and real data for different exercise modes.
is larger in Figure 11.When the walking speed is increased to 3 km/h, it is closer to the natural walking speed of the human body, the human gait is more natural, the muscle coordination is stable and flexible, and the prediction performance will be improved.However, when the walking speed increases to 5 km/h, which is faster than the human walking speed, the accuracy starts to decrease.This indicates that the closer the walking speed is to the normal walking speed of human body, the better the muscle Cross-validation of ankle EMG signal fitting curves for the fifth healthy tester and the second set of data from the diseased tester.Comparison of determination coefficients between prediction and cross-validation data.coordination of the lower limbs is; when it is faster or slower than the normal walking speed of human body, the muscles of the lower limbs are in a situation of insufficient coordination or fatigue, which is contrary to the normal pattern, and the prediction results of the model will decline more and more, which is in line with the normal walking law.For the static plantarflexion state, the model predicted the best results, which may be due to the fact that there is no floor force contact with the ankle plantar dorsiflexion in the static state, which reduces the complex interactions between the foot and the ground or the influence of shoes.

Effects between healthy and patient subjects
The MSE values of healthy subjects and patients under walking exercise showed significant differences.Since people with weak abilities are also prone to problems such as muscle fatigue or even cramps, and in some cases there are factors such as high muscle tone, this study can further explore the reasons for these differences, such as physical fitness differences, testing conditions, and health status.It helps to understand the differences in physiological responses between healthy and sick subjects at different exercise intensities.As shown in Table 1 the MSE values of sick subjects are generally lower than those of healthy subjects, which suggests that there are some differences between the results of sick testers and the expected values in these particular exercises.

Effects of model-oriented motor characteristics of healthy and patient subjects
For patients with lower limb motor dysfunction, there is usually a more severe motor deficit on the affected side, whereas the functional decline is usually weaker on the healthy side.As a result, the sEMG signals we acquire often alternate between useful and useless information, which can lead to problems such as gradient vanishing or gradient explosion, causing loss of information.In addition, these dysfunctional people are prone to problems such as muscle fatigue and even spasticity.The angles of their lower limb movements are more complicated and abnormal.Therefore, extracting the relationship between sEMG signals and movement trajectories under the above more complicated and non-ideal factors is a challenging problem.
In order to verify the generalization performance of our proposed model and the motion estimation performance for subjects with lower limb motor dysfunction, we input the second set of untrained data from the fifth healthy and diseased participants into the model for ankle joint angle prediction.The results are shown in Figure 12.For healthy subject 5, who is not included in the trained dataset, sEMG and corresponding movement angles are measured at different movement speeds (2 km/h, 3 km/h, 5 km/h) and static plantarflexion and dorsiflexion movement states.The results in Figure 13 indicate that, in comparison with subjects 1-4, the predicted R for the walking angle of healthy subject 5 without model training is slightly lower but still greater than 0.985.However, with p > 0.05, the difference is not statistically significant.This confirms that the model exhibits good prediction performance for data from subjects not included in the training set.For patient subject 2, the R > 0.99 for the untrained data on the affected side demonstrates the effectiveness of our proposed model in predicting angles on the affected side.However, as indicated in Table 2, the MSE value is higher compared to the training data, reaching 6.0589.This may be attributed to the extensive angle fluctuation in the late stages of exercise due to muscle spasm or fatigue in diseased subjects.Consequently, the MSE value of the prediction exhibits a substantial error compared to that of the training data.
By estimating the lower limb motion for subjects other than the training data, it can be seen that our proposed method not only has better model generalization ability, but also can predict the lower limb motion angle of the patient in a more ideal way.This helps to identify the physiological differences between healthy individuals and patients in specific movement states or static states.Further analysis can try to identify the physiological or medical factors that lead to these differences, which has potential applications for disease diagnosis, treatment, or health status assessment.

Discussion and conclusion
In this article, a multilayer CNN-LSTM prediction network model incorporating a self-attention mechanism is proposed.In order to validate the performance of the model in predictive tracking of ankle joint mobility for different populations.The remaining data of both healthy and patient subjects are treated as test data and inputted into the model, and the prediction results of different motion states for the fused model are compared.The results show that most of the model's prediction R > 0.9 for different motion states of healthy subjects; in the experiments oriented to the motion characteristics of patient subjects, the angle prediction results of R > 0.99 for the untrained data on the affected side, which proves that our proposed model also has a better effect on the angle prediction of the affected side.Therefore, the model we propose in this article not only has a good exercise estimation ability for healthy subjects, but also can be used for exercise estimation of lower limb dysfunction, which helps to understand the differences in physiological responses between healthy and patients under different exercise modalities, and further analysis can try to find out the physiological or medical factors that lead to these differences, which can then be used for the evaluation of rehabilitation efficacy oriented to clinical patients.The main merits of the proposed method include that the design network architecture has been designed and improves the interpretability and prediction accuracy of the auto-regression model, and reduces the problems of gradient disappearance or explosion caused by redundant sensor information.
Electromyogram neural information collected from the human body provides a new idea for human-robot interaction, and this study provides a feasible solution for accurately estimating the ankle angle of the lower extremity in both health and patients.Future work can be applied to the control of exoskeleton robots, clinical rehabilitation training and evaluation.However, this study also has some limitations.Since patients with post-stroke hemiparesis are virtually unable to perform lower limb walking movements in Brunnstrom stage I and II, our experiment was only able to estimate movements for patients in stage III and above.Subsequently, we will expand the number and range of subjects and explore multi-sensor fusion methods to enhance the reliability of the model.

FIGURE 1
FIGURE 1Noraxon sEMG and inertial sensor acquisition system.

FIGURE 2
FIGURE 2 Compared with conventional networks such as RNN and LSTM, which process the features of time series data with equal weight, self-attention can calculate the correlation degree between each time series data from a global perspective, and allocate different attention to different locations at the end to enhance the main features of time series data.The self-attention mechanism can flexibly adapt to different input sequences and task requirements.For time series A = [a 1 , a 2 , ...a n ], the attention B = [b 1 , b 2 , ...b n ] of each position is obtained by obtaining the correlation degree between the sequence data.The specific calculation process is as follows.

FIGURE 6
FIGURE 6 FIGURE 7 Fitted curves of ankle EMG signals at four exercise speeds in four healthy subjects.(A) Fitting results for subject 1. (B) Fitting results for subject 2. (C) Fitting results for subject 3. (D) Fitting results for subject 4.

FIGURE 9 Fitted
FIGURE 9Fitted curves of EMG signals of the healthy ankle from the first set of data of the diseased test subjects.

FIGURE 13
FIGURE 13 3. Forget gate: the forget gate determines what information is deleted from the state unit.
4. Output gate: the output gate determines which information in the state unit is output to the next time step.The relevant calculation formulas are shown as Equations 4-9.

TABLE 1
Comparison of MSE based on training data.

Table 1 .
It can be seen from the table that the MSE errors in the