Edited by: Xiaodong Guo, Huazhong University of Science and Technology, China
Reviewed by: Bin Fang, Tsinghua University, China; Yue Ma, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences (CAS), China
This article was submitted to Neuroprosthetics, a section of the journal Frontiers in Neuroscience
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
The human–robot interface (HRI) based on biological signals can realize the natural interaction between human and robot. It has been widely used in exoskeleton robots recently to help predict the wearer's movement. Surface electromyography (sEMG)-based HRI has mature applications on the exoskeleton. However, the sEMG signals of paraplegic patients' lower limbs are weak, which means that most HRI based on lower limb sEMG signals cannot be applied to the exoskeleton. Few studies have explored the possibility of using upper limb sEMG signals to predict lower limb movement. In addition, most HRIs do not consider the contribution and synergy of sEMG signal channels. This paper proposes a human–exoskeleton interface based on upper limb sEMG signals to predict lower limb movements of paraplegic patients. The interface constructs an channel synergy-based network (MCSNet) to extract the contribution and synergy of different feature channels. An sEMG data acquisition experiment is designed to verify the effectiveness of MCSNet. The experimental results show that our method has a good movement prediction performance in both within-subject and cross-subject situations, reaching an accuracy of 94.51 and 80.75%, respectively. Furthermore, feature visualization and model ablation analysis show that the features extracted by MCSNet are physiologically interpretable.
The development of artificial intelligence technology and wearable sensors has promoted the rise of human–robot interaction. As the core of human–robot interaction, an human–robot interface (HRI) enables direct communication with a robot via physical or biological signals, which has received widespread attention in the past decade (Simao et al.,
Brain–computer interface (BCI) is an HRI based on electroencephalogram (EEG). It can directly obtain patients' motion intention from the EEG signal and without actual limb movement, so the BCI has been used to predict the movement of paraplegic patients (Tariq et al.,
Compared with the EEG signal, the sEMG signal has a higher signal-to-noise ratio and is less interfered with by external factors. Therefore, the sEMG-based human–robot interface (MHRI) has been earlier and more widely used in the walking-assistant exoskeleton (Kawamoto et al.,
Deep learning has largely alleviated the need for manual feature extraction, achieving state-of-the-art performance in fields such as computer vision and natural language processing (Hinton et al.,
In this paper, a channel synergy-based MHRI is proposed for lower limb movement prediction in paraplegic patients. It uses the sEMG signals of 12 upper limb muscles to predict the lower limb movements. The proposed movement prediction model uses LSTM, depthwise and separable convolutions to extract the spatiotemporal features of multi-channel sEMG signals, and introduces an attention module to extract the synergy of different sEMG feature channels. An sEMG data acquisition experiment is designed to verify the proposed channel synergy-based network (MCSNet). The experimental results verify that MCSNet's prediction accuracy is better than the traditional machine learning-based MHRI and two mainstream deep learning-based MHRI in both within-subject and cross-subject situations. Furthermore, we visualize the features extracted through MCSNet model and perform model ablation analysis. The analysis results show that the features proposed by MCSNet are physiologically interpretable.
In summary, the main contributions of this paper are shown as follows:
A channel synergy-based MHRI is proposed for lower limb movement prediction of paraplegics. It uses the sEMG signals of upper limb to predict lower limb movements, and extracts the contribution, spatiotemporal, and synergy features among different sEMG channels, which improves the accuracy of lower limb movement prediction.
This paper visualizes the features proposed by the MCSNet model and performs the model ablation analysis, and the results show that the features proposed by MCSNet are physiologically interpretable.
Human–robot interfaces used to predict the movement of patients with damaged limb are mainly divided into BCI and MHRI.
The research of neuroengineering promotes the development of BCI, and it is mainly used in the field of medical rehabilitation to realize the perception of user intent. An entire BCI includes three main processing stages of data collection and processing, feature extraction, and classification (Lotte et al.,
Recent research has explored the application of deep learning in BCI (Tayeb et al.,
As the biological signal most relevant to exercise, sEMG has been applied to human–robot interaction for a long time, and the research on MHRI is particularly rich. According to the granularity of movement prediction, traditional MHRI can be divided into two categories, one is MHRI based on motion curve prediction, and the other is MHRI based on motion mode(movement) prediction. The former uses machine learning methods or Hill's musculoskeletal model to build a mapping between handcrafted features and joint angles/torques, which can achieve finer-grained movement prediction. Literature (Suplino et al.,
The MHRI in the back is similar to BCI, which also includes three processing stages. Its main principle is using machine learning methods to map handcrafted features and movements (Afzal et al.,
Deep learning can automatically extract the best feature set from sEMG signals. Many researchers have explored the application of deep learning in MHRI-based movement prediction methods (Allard et al.,
As a tightly human–machine coupled system, the exoskeleton is a typical application scenario of HRI. The application of HRI on exoskeleton can be divided into movement prediction (Kyeong et al.,
Our work is mainly based on the lower limb movement prediction of the walking-assistant exoskeleton for paraplegia patients. It is most closely related to the MHRI based on deep learning, which uses CNN and LSTM architecture to extract the sEMG signal features of different lower limb movements. In contrast to deep learning-based MHRI, this paper propose a channel synergy-based MHRI, which extracts the contribution and synergy of the sEMG feature channel. Its performance is better than traditional machine learning-based MHRI and two mainstream deep learning-based MHRI.
This section presents the methodology details of the proposed movement prediction model. Section 3.1 describes the overall architecture of the MCSNet model. In section 3.2, we introduce seven traditional MHRIs and two mainstream deep learning-based MHRIs, which are used to compare to the MCSNet model.
sEMG is a kind of non-stationary time series data. For movement prediction, extracting more timing features is the basic requirement to improve accuracy. In block 1, for each input sEMG sample segment (size
which
In Equation (2), each of the sEMG signal channels is used to generate its timing feature independently, the timing feature from all the channels will be contacted into
In block 2, we perform two convolutional steps in sequence. First, we fit
In Equations (3) and (4), the size of
In block 3, we use a separable convolution, a depthwise convolution of size (1, 15) followed by
For block 4, we introduced a channel attention module. This operation learns the weights of different synergy features, which can effectively associate movements with the most relevant synergy features and improve the movement prediction accuracy. Moreover, there are differences in the feature contributions of sEMG channels in different subjects under the same movement (muscle compensatory behavior, d'Avella et al.,
Overall architecture of the MCSNet model. Lines denote the convolutional kernel connectivity between inputs and outputs (called feature maps). The network starts with a channel-by-channel long short-term memory networks (LSTM) (second column) to learn the timing feature, then uses a two-layer convolution (third column) to learn different spatiotemporal features. The separable convolution (fourth column) is a combination of a depthwise convolution followed by a pointwise convolution, which can explicitly decouple the relationship within and across feature maps and learns the synergy feature of surface electromyography (sEMG).
We input the generated attention-based spatiotemporal features into the movement classification/prediction part. As shown in
We compared the performance of MCSNet with seven traditional MHRI based on handcrafted features and machine learning models in lower limb movement prediction. In the selection of features, referring to the research conclusions of time domain and frequency domain features in the literature (Phinyomark et al.,
Parameter list of traditional MHRI movement prediction approaches.
LDA | Covariance structure: full rank (for within-subject), diagonal (for cross-subject) |
DT | Maximum fission number: 100 |
BES | Kernel: Radial Basis Function, |
LSVM | Kernel: linear, C = 1, Multiple classification method: OVO |
RBFSVM | Kernel: RBF, C = 1.9, Multiple classification method: OVO |
KNN | Number of neighboring points: 1, Metric function: mahalanobis distance function |
ANN | Number of hidden unit: 28 |
In deep learning, we compared the performance of MCSNet with two-layers CNN (TCNN) and CNN-LSTM models. The TCNN architecture consists of two convolutional layers and a softmax layer which is for classification. The CNN-LSTM architecture includes two LSTM layers, three convolutional layers, and a softmax layer. We implemented these models in PyTorch. For specific details of the model, see
In general, the most significant difference between MCSNet and traditional MHRI movement prediction approaches is the feature extraction method, and the most significant difference from other deep learning-based movement prediction methods is the network architecture. By comparing with other methods, we can prove the effectiveness of the feature extraction architecture we designed.
In this part, an sEMG signal acquisition experiment based on upper limb muscles is designed to verify the effectiveness of the method proposed in this paper. Section 4.1 describes the process of the acquisition experiment and the process of data preprocessing. Section 4.2 gives the implementation details of model training. In section 4.3, we show the MCSNet movement prediction model results and compare MCSNet with other movement prediction models in the case of within-subject and cross-subject. Section 4.4 explains the results of MCSNet model ablation analysis and feature visualization.
A total of 8 healthy subjects were invited to participate in the experiment. Each subject completed four lower limb movements of standing, sitting, walking, and going up stairs while wearing the AIDER exoskeleton. During this period, the sEMG signals of the subjects' upper limbs were collected.
Introduction of the muscle used in the surface electromyography (sEMG) data acquisition experiment and the AssisIve DEvice for paRaplegic patient (AIDER) exoskeleton.
Schematic diagram of surface electromyography (sEMG) data acquisition experiment. The upper part is the preparation posture of the four lower limb movements. We fixed the sEMG acquisition electrode with an elastic bandage to prevent the acquisition electrode from falling off during the experiment. The lower part is the schematic diagram of the experimental acquisition process.
After preprocessing the sEMG data, for the traditional MHRI movement prediction model, use the relevant formula to calculate the features mentioned in section 3.2.1, and then input the features into the Classification Learner Toolbox and Neural Net Pattern Recognition Toolbox to train the prediction model. For the problem of imbalance in the number of samples between movements, we apply a movement class-weight to the loss function. The class-weight we apply is the inverse of the proportion in the training data, with the majority movement class set to 1.
MCSNet and the deep learning-based MHRI movement prediction models are implemented using the PyTorch library (Paszke et al.,
We compared the performance of the proposed MCSNet model with other MHRIs in movement classification/prediction in both the within-subject and cross-subject situations.
For within-subject, we divide the data of the same subject according to a ratio of 7:3 and then use 70% of the data to train the model for that subject. Four-fold cross-validation is used to avoid the phenomenon of model overfitting. Simultaneously, repeated-measures analysis of variance (ANOVA) is used to test the results statistically (using the number of subjects and the classification model as factors, and the model classification/prediction result (accuracy) as the response variable).
We compare the performance of both traditional machine learning-based MHRI movement prediction models (LDA, DT, BES, LSVM, RBFSVM, KNN, and ANN) and deep learning-based MHRI movement prediction models (TCNN and CNN-LSTM) with MCSNet. Within-subject results across all models are shown in
Within-subject movement prediction performance, four-fold cross-validation is used to avoid the phenomenon of model overfitting, averaged over all folds and all subjects. Error bars denote two standard errors of the mean.
Within-subject movement prediction performance (test set ACC).
1 | 0.9200 | 0.7520 | 0.8496 | 0.9451 | 0.9504 | 0.8387 | 0.9315 | 0.8377 | 0.9570 | |
2 | 0.8731 | 0.8097 | 0.7718 | 0.9026 | 0.9159 | 0.8000 | 0.9008 | 0.5849 | 0.9034 | |
3 | 0.7105 | 0.8724 | 0.7852 | 0.8146 | 0.8503 | 0.6018 | 0.7590 | 0.7722 | 0.9089 | |
4 | 0.7888 | 0.6630 | 0.6818 | 0.8594 | 0.8526 | 0.7294 | 0.8428 | 0.7543 | 0.9075 | |
5 | 0.7430 | 0.7962 | 0.4937 | 0.8675 | 0.8911 | 0.7091 | 0.8828 | 0.8525 | 0.9212 | |
6 | 0.8872 | 0.8188 | 0.6747 | 0.8927 | 0.8358 | 0.8923 | 0.8373 | 0.8844 | 0.8437 | |
7 | 0.9600 | 0.8467 | 0.7263 | 0.9602 | 0.9687 | 0.8261 | 0.9523 | 0.7576 | 0.9960 | |
Average ACC | 0.8404 | 0.7941 | 0.7119 | 0.8918 | 0.9031 | 0.7630 | 0.8802 | 0.7709 | 0.9287 |
In the case of cross-subject, we randomly selected the data of three subjects to train the model and selected the data of two subjects as the validation set. The whole process is repeated ten times, producing ten different folds.
Cross-subject prediction results across all models are shown in
Cross-subject movement prediction performance, averaged over all folds. Error bars denote two standard errors of the mean.
The development of methods for enabling feature explain-ability from deep neural networks has gradually become the focus of attention over the past few years, and has been proposed as an essential component of a robust model validation procedure, to ensure that the classification performance is being driven by relevant features as opposed to noise in the data (Ancona et al.,
The average output of all surface electromyography (sEMG) signal samples about the sitting movement for subject 7, and non-negative matrix factorization method id used to find the synergy channels.
We performed non-negative matrix decomposition on the output of LSTM and the first convolutional layer, as shown in
We visualized the synergy characteristics flow of surface electromyography (sEMG) in the sitting movement of subject 7 in the within-subject situation. The figure shows the flow of synergy characteristics in different feature channels of the MCSNet model (orange lines and rectangles). The blue rectangles represent the feature channels of the depthwise and separablewise network layers. The circle represents the weight channel of the attention layer, and the green circle means the channel with a large weight. We found that the channel with a large attention layer weight is basically the same as the channel of the synergy characteristics flow direction. It can be considered that MCSNet can extract the synergy characteristics of the muscle.
In addition, we performed a model ablation analysis on MCSNet under the cross-subject situation, removing depthwise, sparablewise, and attention network structure layers in turn and observing the changes in the prediction performance of the MCSNet model. According to the results in
The result of model ablation analysis.
Depthwise layer | 0.7258 |
Sparablewise layer | 0.7241 |
Attention layer | 0.7187 |
None | 0.8075 |
In this paper, a channel synergy-based human–exoskeleton interface is proposed for lower limb movement prediction in paraplegic patients. It uses the sEMG signals of 12 upper limb muscles as input signals, which can avoid the problem of weak sEMG signals in the lower limbs of paraplegic patients. The interface constructs an channel synergy-based network (MCSNet), it uses LSTM, depthwise, and separable convolutions to extract the spatiotemporal features of multi-channel sEMG signals, and introduces an attention module to extract the synergy of different sEMG feature channels. An sEMG acquisition experiment is designed to verify the effectiveness of the MCSNet model. The results show that MCSNet has a good movement prediction performance in both within-subject and cross-subject situations. Furthermore, feature visualization and the model ablation analysis of MCSNet is performed, the result show that the features extracted by MCSNet are physiologically interpretable. In the future, we consider applying the proposed human–exoskeleton interface to an actual exoskeleton platform. In addition, we will focus on multi-modal movement prediction based on sEMG and EEG.
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below:
The studies involving human participants were reviewed and approved by Ethics Committee of University of Electronic Science and Technology of China. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.
KS designed the movement prediction model, performed the experiments, and drafted the manuscript. RH and FM participated in the design of the movement prediction model and assisted in the manuscript writing. ZP and XY guided writing paper and doing experiments. All authors contributed to the article and approved the submitted version.
This work was supported by the National Key Research and Development Program of China (No. 2018AAA0102504), the National Natural Science Foundation of China (NSFC) (No. 62003073), the Sichuan Science and Technology Program (Nos. 2021YFG0184, 2020YFSY0012, and 2018GZDZX0037), and the Research Foundation of Sichuan Provincial People's Hospital (No. 2021LY12).
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.