A Novel Transfer Support Matrix Machine for Motor Imagery-Based Brain Computer Interface

In recent years, emerging matrix learning methods have shown promising performance in motor imagery (MI)-based brain-computer interfaces (BCIs). Nonetheless, the electroencephalography (EEG) pattern variations among different subjects necessitates collecting a large amount of labeled individual data for model training, which prolongs the calibration session. From the perspective of transfer learning, the model knowledge inherent in reference subjects incorporating few target EEG data have the potential to solve the above issue. Thus, a novel knowledge-leverage-based support matrix machine (KL-SMM) was developed to improve the classification performance when only a few labeled EEG data in the target domain (target subject) were available. The proposed KL-SMM possesses the powerful capability of a matrix learning machine, which allows it to directly learn the structural information from matrix-form EEG data. In addition, the KL-SMM can not only fully leverage few labeled EEG data from the target domain during the learning procedure but can also leverage the existing model knowledge from the source domain (source subject). Therefore, the KL-SMM can enhance the generalization performance of the target classifier while guaranteeing privacy protection to a certain extent. Finally, the objective function of the KL-SMM can be easily optimized using the alternating direction method of multipliers method. Extensive experiments were conducted to evaluate the effectiveness of the KL-SMM on publicly available MI-based EEG datasets. Experimental results demonstrated that the KL-SMM outperformed the comparable methods when the EEG data were insufficient.

In recent years, emerging matrix learning methods have shown promising performance in motor imagery (MI)-based brain-computer interfaces (BCIs). Nonetheless, the electroencephalography (EEG) pattern variations among different subjects necessitates collecting a large amount of labeled individual data for model training, which prolongs the calibration session. From the perspective of transfer learning, the model knowledge inherent in reference subjects incorporating few target EEG data have the potential to solve the above issue. Thus, a novel knowledge-leverage-based support matrix machine (KL-SMM) was developed to improve the classification performance when only a few labeled EEG data in the target domain (target subject) were available. The proposed KL-SMM possesses the powerful capability of a matrix learning machine, which allows it to directly learn the structural information from matrix-form EEG data. In addition, the KL-SMM can not only fully leverage few labeled EEG data from the target domain during the learning procedure but can also leverage the existing model knowledge from the source domain (source subject). Therefore, the KL-SMM can enhance the generalization performance of the target classifier while guaranteeing privacy protection to a certain extent. Finally, the objective function of the KL-SMM can be easily optimized using the alternating direction method of multipliers method. Extensive experiments were conducted to evaluate the effectiveness of the KL-SMM on publicly available MI-based EEG datasets. Experimental results demonstrated that the KL-SMM outperformed the comparable methods when the EEG data were insufficient.

INTRODUCTION
Brain-computer interface (BCI) systems enable machines to accurately perceive the mental states of human beings, thereby establishing an effective user interface between humans and machines. There are several kinds of BCI paradigms, such as steady-state visual evoked potentials (Allison et al., 2008), P300 (Salvaris and Sepulveda, 2009), and motor imagery (MI) (Pfurtscheller and Neuper, 2001). Among them, the MI-based BCI is widely used because of its self-paced fashion, and it does not require any external stimuli (Pfurtscheller and Da Silva, 1999). Electroencephalography (EEG) is the most extensively used technique to record neuronal activity in the brain due to its high temporal resolution, portability, and non-invasiveness. EEG-based motor imagery BCI has shown great potential in many applications, such as rehabilitating the sensory-motor functions of disabled patients (Ang et al., 2011;Al-Qaysi et al., 2018) and facilitating smart living for healthy people (Vourvopoulos et al., 2017;Wang et al., 2019).
Although many machine learning algorithms have been developed to implement MI-based BCI with great success, most of them need to collect a considerable amount of labeled EEG data for model training, which is exceedingly time-consuming and labor-intensive. Insufficient labeled EEG data weaken the generalization capability of the classifier in the prediction. An intuitive solution to this problem is to leverage historical EEG data from the source domain (source subject) in modeling the target domain (target subject). However, this approach may engender some challenges. Owing to the EEG pattern variations between different subjects (Morioka et al., 2015), directly using the EEG data of the source domain may cause performance degradation. Furthermore, because the original EEG data contains personal information, the data of other subjects may not always be available for constructing the classifier for privacy reasons (Agarwal et al., 2019). Thus, exploring an effective knowledge transfer strategy that can protect the personal information of a source subject is highly desirable in the MIbased BCI.
From the perspective of transfer learning (Pan and Yang, 2009), the model knowledge of the source domain can potentially be leveraged to address these problems. Generally, EEG-based learning methods involve two steps: EEG feature extraction and classification. The model knowledge of the source domain can either be integrated into the feature extraction process (Kang et al., 2009;Samek et al., 2013), or be used in modeling the classifier (Azab et al., 2019). Specifically, Kang et al. (2009) proposed leveraging the linear combination of covariance matrices of the source subjects as reference during the feature extraction of the target EEG data. Azab et al. (2019) proposed the construction of multiple-source models and transfer of the weighted multiple-source model knowledge to the target domain. Deng et al. (2013) proposed a knowledge-leverage-based fuzzy system that can leverage the model knowledge from the source domain in order to make up for the lack of labeled target data as well as privacy protection.
Although it has been empirically demonstrated that the aforementioned methods are effective in dealing with EEG classification in scenarios where the labeled data are limited, these methods always need to transform the input data into vectors before classification. It is well known that EEG signals record brain activities over a period of time from multiple channels, which are naturally represented as matrices. Transforming the input matrices into vectors may destroy the correlation of rows or columns within matrix-form EEG features. Thus, several classification methods that can directly handle these matrixform data have been developed accordingly. For example, Wolf et al. (2007) proposed modeling the regression matrix of a support vector machine (SVM), which is the sum of the k rank-one orthogonal matrices (rank-k SVM). Pirsiavash et al. (2009) proposed a bilinear SVM (BSVM) based on factorizing the regression matrix into the product of two low-rank matrices. Although these methods can capture the correlation within matrix data, pre-determining the rank of the regression matrix requires a tedious tuning procedure. Luo et al. (2015) proposed combining the nuclear norm and squared Frobenius norm of the regression matrix to derive the support matrix machine (SMM). The cornerstone of the SMM uses the nuclear norm of the regression matrix as the convex approximation of the matrix rank; thus, its optimization problem becomes more tractable and can be solved using the alternating direction method of multipliers (ADMM) method. Based on SMM, Zheng et al. proposed multiclass SMM (Zheng et al., 2018c) and sparse SMM (Zheng et al., 2018b) for EEG data. Although existing matrix classification methods can effectively deal with the matrix-form EEG data, they have not taken the transferrable knowledge into consideration to improve EEG classification performance. They may suffer from the weak generalization capability when the available EEG data are insufficient.
We propose a novel knowledge-leverage-based matrix classification method for MI-based EEG classification at the first time. The proposed knowledge-leverage-based SMM (KL-SMM) can address the above-mentioned problems by integrating the model knowledge from the source domain and a few labeled target EEG data. It possesses the powerful capability of the SMM for learning matrix-form data. Furthermore, the model knowledge of the source domain can be used to compensate for the deficiency in learning due to the lack of labeled target EEG data. Different from most current model parameter transfer learning methods, the proposed method can propagate the structural information from the source model to the target model. Hence, the generalization capability can be greatly enhanced by transferring the model knowledge and structural information of the source domain. Instead of directly using the source EEG data, the KL-SMM can afford privacy protection by leveraging only the model knowledge of the source domain. In addition, it can be efficiently optimized through the ADMM method. We conducted extensive experiments on two publicly available EEG datasets to validate the effectiveness of the proposed method. As demonstrated by the experimental results, the KL-SMM can achieve promising results in scenarios with few labeled target EEG data.
The remainder of this paper is organized as follows: Section "Related Works" is a review of related works. In Section "Matrix Learning Preliminaries", the notations and preliminaries of the SMM are introduced. The KL-SMM model and its learning algorithm are described in Section "Knowledge-Leverage-Based SMM". In Section "Experiments", the details of extensive experiments and analyses are presented. The conclusions of the paper are presented in Section "Conclusion".

RELATED WORKS
Transfer learning has emerged as a novel technique for retaining and reusing knowledge learned from historical tasks for new tasks. As described above, transfer learning generally refers to the knowledge-leverage-based learning mechanism, which can extract useful knowledge from the source domain and propagate them as the supervision information for modeling the target domain. According to the types of transferred knowledge of the source domain, most current research on transfer learning for EEG classification can be broadly divided into the following categories: (1) instance transfer, (2) feature representation transfer, and (3) model parameter transfer (Wang et al., 2015).
For the first category, it is assumed that the partial source EEG data can be selected and considered together with few labeled target EEG data. The source EEG data are obtained through either instance selection or importance sampling crossvalidation (Li et al., 2010;Hossain et al., 2016Hossain et al., , 2018Zanini et al., 2018). For example, Hossain et al. (2016) proposed an instance selection strategy based on active learning. The selected source EEG data were then used together with available target-labeled EEG data to train the target model. Li et al. (2010) demonstrated the possibility of weighing the source EEG data through the importance sampling cross-validation strategy, following which the source data with high weights were used to estimate the target classifier.
The aim of the feature representation transfer method is to learn a good feature representation, which has some relevant source knowledge encoded within it, for the target subject. Most feature representation transfer learning methods were developed based on the common spatial patterns (CSP) through the modification of the covariance matrix or optimization function (Kang et al., 2009;Lotte and Guan, 2010;Samek et al., 2013). For example, Samek et al. (2013) developed an extension of the CSP. They proposed learning a stationary subspace in which the stationary information of multiple subjects can be transferred. In addition to the above-mentioned shallow feature representation transfer learning methods, several deep transfer learning methods (Fahimi et al., 2019;Hang et al., 2019) have been proposed. In general, these methods apply the domain adaptation techniques in a task-specific layer to incorporate the learned source and target deep features into a common feature space. For example, Hang et al. (2019) proposed leveraging the maximum mean discrepancy and the center-based discriminative feature learning techniques simultaneously to reduce the domain shift, demonstrating a performance improvement in the MIbased BCI.
The third category is the model parameter transfer, which assumes that the source subjects and target subjects share some parameters or prior distributions of the models. Model parameter transfer learning methods always leverage source models to model the target subjects in EEG classification. Despite these successes, most current transfer learning methods require the direct use of source EEG data, which may cause the issue of privacy disclosure, especially for biomedical information. Furthermore, existing transfer learning methods used for EEG recognition always built on that the input data are vectors. However, transforming EEG data, which are naturally represented as matrices, into vectors will destroy their structural information. The proposed method belongs to the third category. Unlike the previous transfer learning methods, the KL-SMM can incorporate model knowledge from the source domain, thereby guaranteeing privacy protection to some extent; it can also directly handle matrix-form EEG data.

MATRIX LEARNING PRELIMINARIES
Among the current matrix learning methods, the SMM (Luo et al., 2015) and its variants [e.g., (Zheng et al., 2018a)] are applied in many fields, owing to their simplicity and effectiveness. In this section, we present some notations and preliminary knowledge on the SMM, which are the foundation of the proposed KL-SMM method.

Mathematical Notations
Matrices are denoted by bold uppercase letters (i.e., X) in the following. For a matrix X ∈ R d 1 ×d 2 of rank r, it can be expressed as rank (X) = r. The condensed singular value decomposition (SVD) of X is denoted as X = U X X V T X , where U X ∈ R d 1 ×r and V X ∈ R d 2 ×r satisfy U T X U X = I r and V T X V X = I r , and X = diag (σ 1 (X) , σ 2 (X) , . . . , σ r (X)) with σ 1 (X) ≥ σ 2 (X) ≥ · · · ≥ σ r (X) > 0.
Definition 1. Given any τ > 0, the singular value thresholding (SVT) (Cai et al., 2010) of matrix X is defined as denotes the nuclear norm of X, and the subdifferential of X * can be defined as follows.

SMM
The matrix classifier, SMM, is defined as a penalty function plus a hinge loss. The penalty function, i.e., spectral elastic net, which enjoys the property of grouping effect as well as keeping a lowrank representation. The hinge loss enjoys the property of large margin while contributing to the sparseness and robustness of the classifier. The objective function of the SMM can be formulated as follows: (3) Specifically, the spectral elastic net can be represented as a combination of the squared Frobenius matrix norm W 2 F = tr W T W and nuclear norm W * on the regression matrix W.
The objective function of the SMM can be solved through the ADMM method; thus, Eq. (3) is reformulated as where According to the augmented Lagrangian function in Eq. (4), W, b and S can be iteratively computed in two steps: where k denotes the iteration index. ρ > 0 is a hyperparameter, and is a Lagrangian multiplier.

Knowledge-Leverage-Based SMM
Generally, the current SMM and its variants belong to the data-driven method that always focuses on achieving impressive classification performance with sufficient training data. In practice, it is necessary to collect sufficient EEG data for each subject to establish a subject-specific classifier. However, longterm recording EEG data may exhaust the subject. Therefore, to model the target domain using insufficient EEG data, we proposed a novel algorithm to enhance the generalization capability of the SMM on the target domain by leveraging the useful knowledge underlying the source domain.
The framework of the KL-SMM for an EEG-based MI BCI is illustrated in Figure 1. To model the target domain, two main types of information, the model knowledge of the source domain and few labeled target EEG data, are used simultaneously.

KL-SMM Model
A dataset D s = X s,1 , y s,1 , X s,2 , y s,2 , · · · , X s,N s , y s,N s in source domain, it consists of N s trials labeled EEG signals.
. For modeling the target domain, we proposed to integrate the labeled target EEG data and source model as follows: (7) Here, L (·, ·) denotes the loss function. f X t,i = tr W T t X t,i + b denotes the matrix classifier to be learned. Eq. (7) includes two terms, where the first term is used to learn from labeled target EEG data, and the second term is designed to leverage the model knowledge (i.e., W s ) underlying the source domain. The goal is to exploit the desired KL-SMM by approximating its model to the source model. The parameter λ is adopted to balance the influence between the two terms above.
As in the SMM, we introduced the spectral elastic net penalty to capture the correlation information within the matrix-form EEG data. Furthermore, the hinge loss function was adopted, owing to its inherent characteristic of sparseness and robustness. Above all, the objective function of the proposed KL-SMM can be formulated as follows: where the parameter C > 0 is used to maintain a balance between fitting the labeled target EEG data and minimizing the complexity of the solution.

Parameter Learning for KL-SMM
Because the Eq. (8) is convex in both W t and b t , an alternating iterative strategy based on the ADMM method can be used to derive the learning algorithm of the KL-SMM. Specifically, by introducing an auxiliary variable S t , the objective function of the KL-SMM can be equivalently reformulated as where The parameter optimization of Eq. (9) can be solved using the augmented Lagrangian algorithm where is the Lagrangian multiplier, and ρ > 0 is a hyperparameter. Theorems 1 and 2 provide the calculations of parameters S t and W t , b t . Theorem 1. For the fixed W t , using the Lagrangian multiplier and any positive scalar τ > 0, ρ > 0 in Eq. (10), S t can be optimized using the following update rule: Proof of Theorem 1: Supposing W t is fixed, the optimization problem in Eq. (10) is equivalent to minimizing the function as follows: Because can be proven, we can conclude that S * t is a solution to Eq. (12). The derivation of J (S t ) with respect to S t can be expressed as To further simplify this equation, let the SVD of (ρW t − ) be denoted as ρW t − = U a a V T a + U b b V T b . In the equation, a represents the diagonal matrix with diagonal entries greater than τ. b represents the remaining part of the SVD with diagonal entries less than or equal to τ. U a and V a (U b and V b ) are matrices that correspond to the left and right singular vectors of the diagonal matrix a ( b ). In terms of Definition 1, S * t can be reformulated as 1 ρ · U a a −τI V T a . Substituting S * t into Eq. (13), we have are column orthogonal, we can easily verify that U T a Z = 0, ZV a = 0, and Z 2 ≤ 1. Thus, we have 0 ∈ ∂J S * t . Theorem 1 is proved. Theorem 2. For the fixed S t , W t , b t can be optimized using the following update rule: where = {i |0 ≤ α i ≤ C, ∀i = 1, 2, . . . , N } refers to the Lagrangian multipliers, andα = [α 1 , α 2 , · · · , α N ] T ∈ R N can be obtained using the box constraint quadratic programming solver: where K = K ij ∈ R N×N and H = h i ∈ R N with K ij = 1 2λ + ρ + 1 y t,i y t,j tr(X T t,i X t,j ), Proof of Theorem 2: Given the fixed variable S t , the optimization problem in Eq. (10) equals to optimize the following objective function: Frontiers in Neuroscience | www.frontiersin.org Setting the derivative of L a with respect to ξ t i and b t , to 0, we can obtain (22) Substituting Eq. (22) into Eq. (21), and then setting the derivative of L a with respect to W t to 0, we obtain Substituting Eq. (22) and Eq. (23) into Eq. (21), α i α j y t,i y t,j tr(X T t,i X t,j ) + θ. (24) Here, θis a constant, which can be represented as Thus, the dual problem of Eq. (24) can be denoted as Algorithm 1: The learning procedure for KL-SMM Input: Training dataset D T = X t,i , y t,i N i=1 , source model W s , parameter τ and λ; Output: W t , b t ;

Until convergence
The optimization problem of Eq. (21) can finally be transformed into a QP problem. Substituting the obtained optimal solution α into Eq. (23), it is easy to obtain the value of W t . Finally, the optimal b t can be calculated as follows: In practice, averaging these optimal solutions, we can obtain where = {i |0 ≤ α i ≤ C, ∀i = 1, 2, . . . , N }. Theorem 2 is proved.
For the fixed W t and S t , the Lagrangian multiplier in Eq. (10) can be updated as follows: The optimal solution is estimated iteratively. The learning procedure for the KL-SMM is given in Algorithm 1.

Computational Complexity
We further analyzed the computational complexity of the KL-SMM. In Algorithm 1, Step 1 computes the parameter W t , b t by solving a QP problem in Eq. (17), which takes time O N 2 pq with Nsamples of p × q dimension.
Step 2 computes the eigen decomposition for S t in Eq. (11), which takes time O min p 2 q, pq 2 . In practice, the dimensions, p and q, of the extracted EEG features are not too high. Thus, the computational complexity of the KL-SMM is dominated by the QP, that is, O I · N 2 pq , where I denotes the iteration number.

EXPERIMENTS
In this section, we evaluate the proposed KL-SMM on two publicly available MI EEG datasets [i.e., Datasets IIa and IIb of the BCI competition (Hang et al., 2020)], which can be found in http://www.bbci.de/competition/iv/. We first describe the EEG datasets. Then, the compared methods and corresponding parameter settings are provided. Finally, we present and discuss the experimental results.

EEG Data Description and Preprocessing
(1) BCI competition IV Dataset IIa (Exp.1): This dataset includes 22-channel EEG signals recorded from nine subjects (denoted as S01-S09). During the experiment, each subject was required to perform four kinds of MI tasks, hand (left and right), foot, and tongue. A total of 576 trials of two sessions on different days were collected for each subject. In our experiment, we used the left-hand and right-hand EEG data. In addition, the EEG trials collected from the second day were adopted. Thus, the training and test datasets each contained 72 EEG trials.
(2) BCI competition IV Dataset IIb (Exp.2): This dataset also contains the EEG signals of nine subjects (denoted as B01-B09), which were recorded using three electrodes, C3, Cz, and C4. During the experiment, each subject was instructed to perform left-and right-hand MI tasks for 4.5 s. For each subject, there were five sessions. Sessions 1, 2, and 3 were collected on the first day, and 4 and 5 were collected on the second day. Similar to Exp.1, the EEG trials collected from the second day were used. Specifically, Session 4 was used as the training data, and Session 5 was used as the test data.
With reference to Hang et al. (2020), there was a time interval of [0.5, 3]s after the visual cues in each trial for all the datasets. We bandpass-filtered the EEG signals to 8-30 Hz through a five-order Butterworth filter, which covers the dominated frequency band for MI tasks (Nam et al., 2011). Then, we adopted the spatial filters to detect the MI-related desynchronization/synchronization (ERD/ERS) patterns. Finally, the widely used band-power estimation method (Vidaurre et al., 2005) was used to extract the matrix-form EEG features for all the subjects. To construct the transfer learning tasks, each subject was considered the target domain, and the training data of the remaining subjects constituted the source domain. To evaluate the performance of the KL-SMM, we set three different numbers of labeled target EEG data, that is, the first 8, 14, and 20 training trials. The classification performances on the test data of all the subjects were reported.

Implementation Details
It is known that the format of the input data of both the SVM and ASVM should be vectors or scalars. Thus, we first had to reshape the extracted two-dimensional matrix features into vector features. For the BSVM, SMM, and proposed KL-SMM, the matrix features can be inputted directly. To evaluate the effect of the transfer learning mechanism, because the SVM, BSVM, and SMM are no-transfer baselines, we simply used the labeled target EEG trials as the training data to build these classification models. In addition, for the ASVM and KL-SMM, we also leveraged the source model knowledge in constructing the target classifier. However, unlike the ASVM, the traditional transfer learning method, the KL-SMM can directly process EEG matrix features and fully exploit the structural information.

Experimental Results Analysis
The classification performances of all the comparison methods on 14 labeled target EEG trials on two datasets are given in Tables 1-6. The performance comparison of the KL-SMM with the compared methods in Exp.1 is shown in Tables 1-3. The classification results of all compared methods in Exp. 2 are shown in Tables 4-6. The best classification performance values are highlighted in bold. According to the results, the following conclusions can be drawn.
From the classification performances of all the comparison methods on the 14 labeled target EEG trials, we found that the proposed KL-SMM method achieved the highest average results in terms of the ACC, AUC, and F1. As shown in Tables 1-3, the proposed KL-SMM outperformed the baseline SMM on average by 5.87%, 5.66%, and 8.85% based on the ACC, AUC, and F1, respectively. As can be observed from the classification results in Tables 4-6, the KL-SMM outperformed the SMM on average by 3.68%, 3.78%, and 5.83% based on the ACC, AUC, and F1, respectively. The promising performances prove that the KL-SMM can leverage the model knowledge of the source subject to boost the generalization capability of the SMM when there are limited labeled EEG trials. In addition, the KL-SMM boosted the classification accuracy for six out of nine subjects in Exp.1, and eight out of nine subjects in Exp.2, respectively. These experimental results further demonstrate the effectiveness of the proposed KL-SMM that leveraged the knowledge underlying the source domain.
The BSVM and SMM outperformed the SVM in most cases. This confirms the ability of the BSVM and SMM to exploit the correlations between rows or columns of EEG matrix features.  better classification results than the SVM. The foundation of the KL-SMM is the SMM, which can leverage the source model knowledge and exploit the structural information within the EEG feature matrices. The experimental results prove that structural information can indeed improve classification performance. We further studied the effects of different numbers of labeled target EEG instances on the classification performance of the KL-SMM. Figure 2 shows the average classification ACCs when 8, 14, and 20 labeled target EEG trials were available from the target subject. Figures 2A,B show the average classification results of all the compared methods for Exp.1 and Exp.2, respectively. It can be observed that the KL-SMM outperformed the other methods in all the cases. Specifically, the improvement was more pronounced when few labeled target EEG trials were available, as shown in Figure 2B. From these results, we can observe that increasing the number of the labeled target EEG instances improved the average classification ACCs of all the compared methods. This is mainly because more training data may enhance the generalization performance of the classification model. The average ACC of the KL-SMM was significantly better than those of the other methods when there was no knowledge transfer, especially when the labeled target EEG instances were very limited. Overall, compared to other methods, the classification performance of the proposed KL-SMM was superior. The encouraging results were mainly attributed to the fact that the KL-SMM method possessed the matrix learning capability derived from the matrix learning machine, which allowed it to directly handle the matrix-form features, thus retaining the structural information of EEG data. In addition, the KL-SMM achieved a more outstanding classification performance because of its ability to leverage the useful model knowledge of the source domain.

Statistical Analysis
We further performed a t-test statistical analysis to verify whether there was a significant difference with a confidence level of 95% between the KL-SMM and the other methods. The results of the t-test using different numbers of labeled target EEG trials are shown in Table 3. A p-value less than 0.05 indicates that there are significant differences between the KL-SMM and the other methods. We highlighted the statistically significant differences in boldface. From Table 7, it can be observed that in all cases, the null hypothesis can be rejected. This proves that the KL-SMM significantly outperformed the other methods. This further demonstrated the ability of the KL-SMM to capture the structural information within the EEG data, in addition to a strong transfer learning capability. Therefore, it is suitable for the classification of complex matrix-form EEG data with cross-subject variability. Figure 3A shows the running time of the KL-SMM and other methods on a subject, S01, in Exp.1 using 14 labeled target EEG trials. Except for the SVM and ASVM, the KL-SMM achieved comparable computational cost with the traditional matrix leaning method SMM. Furthermore, the KL-SMM required less computational time, compared to the BSVM. It was proven that the running time of the KL-SMM was approximately 1.6 times less that of the SMM. The KL-SMM achieved better classification results, without the increase in computational costs becoming unacceptable. This shows the potential value of the KL-SMM for real-world BCI applications.

Parameter Sensitivity
We further show the effect of free parameter on the performance of KL-SMM, i.e., the knowledge transfer penalty λ. We conduct parameter sensitivity experiments on the transfer tasks S04 in Exp.1 and B04 in Exp.2 using 14 labeled target EEG trials, respectively. We vary the parameter of interest in {1e − 4, 5e − 4, 1e − 3, 5e − 3, 1e − 2, 5e − 2, 1e − 1, 5e − 1, 1e0}. Figure 3B shows the classification accuracy of our KL-SMM in contrast to SMM represented as dashed lines. It can be found that the accuracy of KL-SMM is improved with the increase of parameter λ, suggesting that taking the model knowledge of source domain into account can benefit for EEG classification. As the parameter value is further increased, the classification performance will decrease due to the distribution discrepancy between the source domain and the target domain.

CONCLUSION
In this study, we proposed a KL-SMM method for MI-based BCIs. The proposed KL-SMM belongs to the matrix classifier, which can exploit the structural information of EEG data in matrix form. Furthermore, it can leverage the existing source model knowledge in modeling the construction of the target subjects in scenarios of limited labeled target training data. Similar to the SMM, the KL-SMM can be easily optimized using the ADMM. Extensive experimental results on two publicly available MI datasets demonstrate the superiority of the KL-SMM to the compared methods in most cases. However, despite its promising performance, there is still room for improvement. For example, adaptively controlling the penalty λ is critical to determining how much knowledge is transferred. In addition, how to extend KL-SMM to multi-class classification will be investigated in future work.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: http://www.bbci.de/competition/iv/.

AUTHOR CONTRIBUTIONS
YC is responsible for data processing and data analysis. WH and SL are responsible for manuscript writing. XL and QW is responsible for study design. GL is responsible for experimental design. JQ and K-SC are responsible for manuscript editing. All authors contributed to the article and approved the submitted version.