Relevant Feature Integration and Extraction for Single-Trial Motor Imagery Classification

Brain computer interfaces provide a novel channel for the communication between brain and output devices. The effectiveness of the brain computer interface is based on the classification accuracy of single trial brain signals. The common spatial pattern (CSP) algorithm is believed to be an effective algorithm for the classification of single trial brain signals. As the amplitude feature for spatial projection applied by this algorithm is based on a broad frequency bandpass filter (mainly 5–30 Hz) in which the frequency band is often selected by experience, the CSP is sensitive to noise and the influence of other irrelevant information in the selected broad frequency band. In this paper, to improve the CSP, a novel relevant feature integration and extraction algorithm is proposed. Before projecting, we integrated the motor relevant information to suppress the interference of noise and irrelevant information, as well as to improve the spatial difference for projection. The algorithm was evaluated with public datasets. It showed significantly better classification performance with single trial electroencephalography (EEG) data, increasing by 6.8% compared with the CSP.


INTRODUCTION
Brain-computer interface (BCI) is a way of communication that aims to provide a communication path between humans and computers. It directly translates brain activity into a series of control commands. Accordingly, it provides a non-muscular output channel for the brain and communicates with devices directly (Yu et al., 2014). This interface may offer disabled people a great prospect by solely translating their intentions that are reflected in their brain signals into actual instructions (Lemm et al., 2005). In addition, BCI can also be used as a neurorehabilitation tool to improve motor and/or cognitive performance of people after neurological diseases, such as stroke (van Dokkum et al., 2015) and tetraplegia (Vuckovic et al., 2015). In the BCI system, several modalities have been used for brain signal acquisition, such as electrocorticographic (ECoG) (Leuthardt et al., 2004), electroencephalography (EEG) (Bennet et al., 2016), magnetoencephalography (MEG) (Sardouie and Shamsollahi, 2012), functional magnetic resonance imaging (fMRI) (Ruiz et al., 2014), functional near-infrared spectroscopy (fNIRS) Hong, 2013, 2015;Hong et al., 2015;Naseer et al., 2016a,b), and intracortical neuronal spikes (Gupta et al., 2016). Among them, because of the real-time, low-cost, portable and noninvasive properties of EEG, it is one of the most convenient means to measure neurophysiological activity in the practical BCI application (Mihajlovic et al., 2015). Electroencephalography (EEG) modulated by motor imagery (MI) is one of the most studied types of EEG signals of the BCI systems for the similarities of motor-related area involvement with motor execution . MI can be revealed on brain activity patterns of the imagination of a motor action, but without its physical movement. During an MI task, the EEG activity is accompanied by an increase or decrease in the EEG magnitude which is known as an eventrelated synchronization or desynchronization (ERS/ERD). The ERD and ERS are non-phase-locked modulations of the EEG power, usually confined to a specific frequency band. ERD and ERS have been suggested to reflect the cortical activation and cortical deactivation (Hu et al., 2015). In particular, ERD of µrhythm (8-12 Hz) is usually associated with MI (Neuper and Pfurtscheller, 2001;ter Horst et al., 2013). BCI based on MI is an efficient path of rehabilitation, and it achieves excellent findings on complex movement (Qiu et al., 2017).
A big challenge for BCI based on motor imagery is to correctly and efficiently identify and extract subject-specific features from the blurred scalp EEG and translate those features into device commands (Wu et al., 2008). Based on topographic patterns, the Common Spatial Pattern (CSP) has been shown to be very efficient in the establishment of subject-specific discriminative spatial filters . The CSP algorithm decomposes multi-channel EEG from two classes into spatial patterns and enhances the separability between the two classes by diagonalizing the covariance matrix at the same time (Park et al., 2014). However, the conventional CSP algorithm selects multichannel magnitude features on frequency band, which is selected by experience . As a result, it is sensitive to noise and the influence of other irrelevant information in the selected broad frequency band. Therefore, method for the optimization of the characteristics is urgently needed.
A noteworthy attempt, namely the Common Spatio-Spectral Pattern (CSSP) algorithm has been reported in Lemm et al. (2005). In the CSSP algorithm, the filter is constructed by the method of time-delay embedding. However, the CSSP algorithm limits the flexibility of the filters. The Common Sparse Spectral-Spatial Pattern (CSSSP) performs simultaneous optimization of an arbitrary Finite Impulse Response (FIR) filter . The spectral weighted common spatial pattern (SPEC-CSP) (Tomioka et al., 2006) optimizes the filter in the frequency domain and the spatial filter is an iterative procedure. But, this method is computationally expensive. The Filter Bank Common Spatial Patterns (FBCSP) (Ang et al., 2012) uses mutual information to select the optimal frequency band and time range. Xu applies particle swarm method to optimize frequency band and time interval (Xu et al., 2014). Local temporal correlation common spatial patterns employs local temporal information to estimate covariance matrices instead of Euclidean distance method of CSP (Zhang et al., 2013). The Regularizing Common Spatial Patterns (RCSP) adds a regularization algorithm to the CSP algorithm by a priori knowledge (Lotte and Guan, 2011). However, it does not consider the multivariable nature of the EEG signals, and thus it limits the feasibility of this method.
In this paper, an algorithm designated Spectral Component Common Spatial Pattern (SCCSP) is proposed. It provides a new approach to further improve the classification performance of the motor-imagery-based BCIs. To feature optimize, it focuses on the changes of the amplitude spectrum during motor imagery, and utilizes Independent Components Analysis (ICA) to extract the components from multi-channel amplitude spectrum with the aim of separating motor-relevant and irrelevant information from obscure EEG amplitude features applied by CSP. Accordingly, SCCSP could increase the classification accuracy of single-trial motor imagery EEG by improving the spatial difference of projecting.

DATA ACQUISITION AND CONFIGURATION
Two publically available datasets from BCI competitions were collected for the evaluation of the proposed algorithm for motor imagery. For the classification algorithm of CSP is the binaryclass classification algorithm, two classes of motor imagery EEG data are collected from the two public datasets. The first public dataset recording the imagination left and right hands movement is collected from the publically available dataset BCI competition IV, dataset IIa (http://bbci.de/competition/iv/), including all 9 subjects. This dataset records EEG with twenty-two electrodes with a sampling rate of 250 Hz. Each trial (experiment) lasts 7.5 s. The subjects imaged movements from t = 3 s to t = 6 s in trials. Before this period, it is the period for preparation. The second public dataset is the dataset IIIa from the BCI competition III using a 60-channel amplifier with a sampling rate of 250 Hz, including all 3 subjects. The subjects imaged left and right hand movements from t = 3 s to t = 7 s in trials. Before this period, it is the period for preparation. Both of datasets were online filtered by a bandpass filter and a 50 Hz notchfilter to remove artifacts. A summary of the two datasets is presented in Table 1. The electrodes locations of two datasets are shown in Figure 1.

Feature Extraction and Integration
For each part of the human body, there exists a respective region in the primary motor cortex and somatosensory area  of the neocortex (Chainay et al., 2004;Blankertz et al., 2008). The imaged part is surrounded by the other regions which represent other parts of the human body. Previous studies (Pei et al., 2005;Byblow et al., 2007) indicated that there was a parallel functional process between the lateral somatosensory area and the mid-central area during activation, indicating the independence of the hand and feet/leg areas during imagery. The inhibition mechanism was independent of the excitation mechanism on the somatosensory area (Ikeda et al., 2000). Accordingly, it is hypothesized that the area which represents the part of the imaged human body is independent of other areas which represent the parts of the un-imaged human body in the neocortex. However, the effect of volume conduction, EEG modulated by MI should be the combination of several independent components. Thus, it is urgent to source separation. Independent Components Analysis (ICA) is a blind source separation method under the temporal information. It has emerged as a valuable signal processing method for the analysis of multivariate channel data (Woods et al., 2015). Let the timevarying observed signals be X = [x 1 (t), ..., x m (t)] T , and the S = [s 1 (t), s 2 (t)..., s n (t)] T t = t 0 , ..., T, is matrix that contains unknown pure components, m and n indicate the channels of the observed signals and components, respectively. ICA assumes that the signal X is an instantaneous linear mixture of independent sources: where the matrix E of size m×n is the mixing matrix, whose component represents the linear memoryless mixing channels.
To recover all the independent components (ICs) of the observed signals, ICA aims to obtain a de-mixing matrix W with minimal knowledge of E and S. The recovered signals U= (u 1 , u 2 ,..., u n ) T are given by Equation 2 (Monakhova et al., 2015).
Therefore, the ICA problem can be restated as the problem of finding W such that the sources of U are maximally independent. We focus on the improvement of the classification accuracy based on the oscillatory feature (ERD/ERS). Motor imageries are accompanied by the ERD in specific frequency band , indicating an obvious sinking of the amplitude spectrum. To maximize the separability between classes, the feature extraction and integration algorithm is designed by integrating motor-related information. To suppress the influence of imagination irrelevant information and noise, we want to extract relevant information from blurred feature and integrate imagination related information on multichannel dimensions into a single dimension. In this paper, we extended the conventional ICA algorithm to the frequency domain, and named it as Spectral Independent Components Analysis (SICA). It is hypothesized that the independent component, which is relevant to the imagination contains most of motor imagination information under the information theory. Information maximization algorithm of ICA (Hansen et al., 2001) was applied, and two independent components, imagination relevant information and imagination irrelevant information, were extracted with the SICA. Accordingly, in SICA, the Equation 2 is reconstructed as below: suppressed. Namely, the cortex is activated (ERD) . This is a reliable feature of brain activity for BCIs based on motor imagery. For evaluating the effectiveness of SICA algorithm, the 6 channels simulation data of the adult MI EEG without any kinds of mental disease and damage were applied on the same hemisphere. Practically, the ERD often appears on several channels. To imitate this phenomenon, the µ-rhythm on two channels (the 5 and 6th channels) of the simulation data was suppressed. The simulation data of every channel was the sum of the sinusoidal signals with the frequency range from 0 to 20 Hz. The amplitude of the frequencies obtained a greater one on the low frequencies (simulation of real EEG nature), and the sum of maximum and minimum was under 12 uV. The frequency spectrum of simulation data on 6 channels is shown in the Figure 2 after Fast Fourier Transform (FFT). SICA based on information maximization algorithm was used to extract the independent components from the frequency spectrum information of simulation data and the results are illustrated in Figure 2 (component 1 and component 2). The results of the simulation data indicated that the µrhythm suppression or activation should be the criterion for the separation of independent components, and the µ-rhythm suppression information was integrated effectively and clearly. Further, SICA is an effective tool in the amplitude spectrum for feature extraction and integration of MI.

Projecting
The aim of CSP is the maximization of the difference between signals of two classes after feature extraction and integration in this study. Y k = [y 1 (t), y 2 (t),..., y p (t)] T is defined as the kth time domain feature after feature extraction and integration, where p is the number of ICs. The normalization covariance matrices C 1 and C 2 of the two classes are calculated using Equation 4. The covariance space C = C 1 + C 2 consists of the covariance matrices of the two classes. Whiten the matrix C and receive a matrix P as shown in Equation 5.
S 1 and S 2 are defined as S 1 = PC 1 P T and S 2 = PC 2 P T , and then calculate the orthogonal matrix R and the diagonal matrix D by singular value decomposition.
Where, i = 1, 2, as I = S 1 + S 2 , D 2 = I -D 1 . Therefore, when the eigenvalue of S i (i = 1, 2) is closer to I, the eigenvalue of the other S i (i = 2, 1) is closer to 0. The difference of the two classes is maximization. The filter is constructed by Equation 7.
The characteristic for the classifier is calculated by Equation 9.
Where, Z 1 and Z 2 are the projection of Y k by the filters of two classes.
FIGURE 2 | The results of the SICA on simulated data. The frequency spectrum information of six channels are shown from channel 1 to channel 6. Component 1 and component 2 are the ICs extracted.

Data Processing
All trials were extracted from the two datasets with a bandpass filter of 5-30 Hz by a fourth-order Butterworth filter before analysis. A k = [a 1 (t), a 2 (t),..., a g (t)] T t = t 0 , ..., T was the kth EEG record, where g is the number of electrodes. To suppress the mutual interference of the hemispheres, and to extract and integrate the imagination relevant information by SICA; the EEG data were separated by hemisphere and named as A l k and A r k in every trial. After fast Fourier transform as illustrated by Equation 10, H l k and H r k were analyzed by SICA for the feature extraction and integration. Two independent components U l(r) 1 and U l(r) 2 which contained imagination relevant or irrelevant information were extracted over each hemisphere. In other words, the imagination relevant information was separated from irrelevant information and integrated together on each hemisphere. After inverse Fourier transform, four temporal components Y l 1 , Y l 2 , Y r 1 , and Y r 2 were rearranged as feature matrix according to hemisphere, and the component matrix Y k =[Y l 1 , Y l 2 , Y r 1 , Y r 2 ] T was projected. The flow chart of proposed method is illustrated in Figure 3.
Furthermore, whether the proposed method could provide better single trial classification capability than conventional CSP which relied on the bandpass filter was verified by the classification accuracy on twelve subjects of the public datasets. The conventional CSP only applied a bandpass filter from 5 to 30 Hz before projecting. Additionally, the results of the competing feature extraction and integration method, temporal ICA was also reported for comparison. The method named as ICA-CSP which extracted imagination relevant and irrelevant information by conventional temporal ICA before the components were projected. Similarly, ICA-CSP extracted four temporal components and rearranged them according to the hemisphere as SCCSP. The Analytic Common Spatial Patterns (ACSP), CSSSP and the Bilinear Common Spatial Pattern (BCSP) (Yu et al., 2013) and FBCSP were also studied for comparison. The parameters of the FBCSP were the same as the previously reported (Ang et al., 2012). After projecting, a classifier was adopted by LIBSVM (Chang and Lin, 2011) with Radial Basis Function (RBF) by the algorithms. The training and test trials did not overlap on every subject. The numbers of the training and testing trials were half of the whole trials for every subject. The classification performance was evaluated by classification accuracy which is the ratio between the correct number after the classifier and the sum of trials. K-fold cross-validation was applied as cross-validation. The number of K was half of trials in every subject to make sure that every data could be used as the training data and testing data once. K was higher than 10 in all subject. The Lilliefors test was used to evaluate results if they obeyed normal distribution. One-way Analysis of Variance (ANOVA) with repeated measures was applied for statistical analysis of results, and pair t-test and least significant difference were used as a post-hoc test methods. All calculations were performed in MATLAB. Figure 4 shows the ERD/ERS maps at 5-15 Hz of the fifth subject from the dataset IIa during the left hand MI. It indicated that the µ-suppression appeared on several contralateral electrodes. The classification accuracies of the six methods are presented after cross validation in Table 2. They showed that the SCCSP outperformed CSP, ICA-CSP, CSSSP, BCSP and ACSP, achieving 6.8, 3.5, 11.5, 26, and 15.5% higher average classification accuracy than these algorithms, respectively. Among the 12 subjects, SCCSP showed better performance than CSP in 10 subjects.

RESULTS
The Lilliefors test showed that the classification accuracy from six algorithms obeyed the normal distribution. The probabilities were 0.1852, 0.5, 0.3136, 0.2141, 0.5, and 0.3909 for the CSP, SCCSP, ICA-CSP, CSSSP, BCSP, and ACSP, respectively. ANOVA indicated that there was significant difference among the six algorithms [F (1, 72) = 8.53, p < 0.001]. Moreover, the paired ttest showed that the better performances of SCCSP over CSP (p < 0.05), ICA-CSP (p < 0.05), CSSSP (p < 0.05), BCSP (p < 0.001) and ACSP (p < 0.001) were significant. Least significant difference, used as post-hoc test, showed that the better performances of SCCSP over CSSSP, BCSP, and ACSP were significant at 0.05 level. Additionally, ICA-CSP achieved 3.3% higher average classification accuracy than CSP. The kappa value was also applied to evaluate the consistency of classification performance.
FIGURE 3 | The flow chart for data processing.
Frontiers in Neuroscience | www.frontiersin.org  Where, p o is the classification accuracy; p e denotes the probability of expected agreement. The results of the kappa values are listed in Table 3. The SCCSP outperformed CSP, ICA-CSP and FBCSP, achieving 0.247, 0.094, and 0.109 higher average kappa value than these algorithms, respectively. The Lilliefors test showed that the kappa value from these algorithms followed the normal distribution. The probabilities were 0.5, 0.3573, 0.5, and 0.076 for the CSP, SCCSP, ICA-CSP, and FBCSP, respectively. The ANOVA indicated that there was significant difference among these algorithms [F (1, 36) = 5.99, p < 0.05] in the kappa value. The paired t-test showed that the better performances of the SCCSP over CSP (p < 0.001) and ICA-CSP (p < 0.05) were significant. The better performance of the ICA-CSP over CSP (p < 0.001) was significant. Moreover, the probability of SCCSP performance over FBCSP was 0.08. Least significant difference, used as posthoc test, showed that the better performance of SCCSP over CSP was significant at 0.05 level.
The feature extraction and integration result of subject 5 is presented in Figure 5. The result presented in Figure 5A shows the topographical view of the average time-frequency representation of ERD/ERS values in µ-rhythm during hand imagery. The result in Figure 5B shows the filtered result by the bandpass filter in CSP. Figure 5C reveals the result obtained by feature extraction and integration algorithm proposed where the components were converted by matrix W. The result in Figure 5A was consistent with the fact that the EEG suppressions were contralateral to the imagined hand movement (Pfurtscheller and da Silva, 1999).
To study the stability of the SCCSP, the number of the trials for training the classifier was varied from 2 to 50 with about 10 steps. The results of classification accuracy with error bar for every step are presented in Figure 6 after cross validation. The average classification accuracy and standard deviation of accuracy of the SCCSP and CSP were calculated and are shown in Table 4. The SCCSP achieved 12.1% higher average accuracy than CSP. The Lilliefors test shows that the average classification accuracy from these algorithms obeyed the normal distribution. The probabilities were 0.5 and 0.5 for CSP and SCCSP, respectively. The ANOVA results indicated that there was significant difference between these algorithms [F (1, 24) = 35.97, p < 0.001] for classification accuracy. The paired t-test showed that the better performance of SCCSP over CSP was significant (p < 0.001). Furthermore, SCCSP had a smaller average standard deviation of classification accuracy than CSP. A classification of the f in Equation 9 of subject 5 is shown in Figure 7 for visualization. The statistical results of the f under the SCCSP and CSP are shown in Figure 8. A paired t-test analysis showed that the SCCSP achieved a higher difference between the two classes than CSP (p < 0.05). For quantitative analysis, the within-class distance B and between-class distance D were applied.
Where, || d k || denotes the Euclidean distance between the f k and the gravity in C 1 or C 2 . || d ij || is the Euclidean distance between the f i in C 1 and the f j in C 2 . To evaluate the difference between the two classes, the ratio of within-class distance and   between-class distance λ is derived using Equation 14. A lower λ indicated a greater separability between classes. As a result, the SCCSP achieved nearly twenty times reduction of λ compared to the conventional CSP algorithm, on average.

DISCUSSION AND CONCLUSION
Before the onset of motor imagery and execution, somatosensory area which is a part of the posterior parietal lobe needed some information, such as location, which comes from proprioception and visual area, etc. The prefrontal lobe and posterior parietal lobe determine and control movements. The axons of the prefrontal lobe and posterior parietal lobe concentrate on the Brodmann area 6 of which including the Supplementary Motor Area (SMA) and the Premotor Area (PMA). Most of the corticospinal tracts connect with the efferent fibers of the Brodmann area 6 which encodes the movement and primary motor cortex. The independence discussion of the inhibition mechanism and excitation mechanism on different motorfunction area of somatosensory area (Ikeda et al., 2000;Pei et al., 2005) provided a great possibility of activation independence on function areas which represent different parts of body in the primary motor cortex. This reveals that EEG of one-task mental motor imagery should be the combination of time and spatial independent sources on motor-related areas.
In this paper, we extended the temporal ICA to amplitude spectrum analysis. A novel SCCSP algorithm for motor imagery classification based on SICA was proposed. This SCCSP method provided greater classification accuracy than CSP, ICA-CSP, CSSSP, BCSP, and ACSP. The kappa results also exhibited a better performance than CSP, ICS-CSP, and FBCSP. SICA is the extension of blind source separation. Therefore, the  better classification performances of SCCSP and ICA-CSP may indicate that the time-frequency independence nature of motorrelated sources in this experiment. Moreover, the greater average classification accuracy of SCCSP than ICA-CSP may show a possibility of greater separability or independence on frequency domain. In practice, the channels which reveal µ-suppression varied with trials. For the volume conduction, the suppression appeared in a wide region. This was a challenge to improve the spatial separability of the features. However, the algorithms for projecting were sensitive to the arrangement of feature, spatial distribution. Under SCCSP, a feature extraction and integration method based on SICA was applied. This method can extract the relevant imagination information into one component. That is, the integration of the feature algorithm could separate the motor relevant information from blurred data on multi-channel, concentrate relevant feature, suppress the influence of other region which represent other un-imaged parts of the body, and noise, and enlarge the spatial distribution separability of the features. The pure bandpass filter applied by CSP only suppressed the interference of other frequencies, while the information of other irrelevant function areas and noise remained in the frequency band selected. The results presented in Figures 5B,C illustrate that the proposed algorithm obtained a greater spatial separability, while the information extracted by bandpass filter was obscure. The greater spatial separability extracted by the feature extraction and integration algorithm was favorable for improving the classification accuracy. Therefore, this SCCSP can reduce the interferences both in the other frequency bands and in the frequency band selected to improve classification accuracies. Moreover, the results of SCCSP and spatio-spectral filter selection method by cognitive fuzzy inference system (SCIF) (Das et al., 2016) indicated that there was 3.3% accuracy increase of SCCSP over SCIF on dataset IIa. Though there were different datasets, it was comparability. SCCSP achieved 5.18% higher average accuracy than the dynamic frequency feature selection method mentioned in Luo et al. (2016). Therefore, SCCSP achieve a better performance on MI classification. SCCSP provided a lower average standard deviation of classification accuracy than almost all other methods. The statistical results of classification accuracy and kappa values indicated that the feature extraction and integration of SCCSP should be individual variability and adaptability. That is, SCCSP can decrease the individual difference. In Table 3, ICA-CSP and SCCSP both achieved significantly higher performances regarding the kappa value. Therefore, ICA is an efficient feature extraction algorithm to improve the spatial separability of features. The results presented in Figure 6 and Table 4 illustrated that SCCSP achieves greater average classification accuracy and a smaller standard deviation compared with CSP, simultaneously. The curve of the classification accuracy on SCCSP was steadier than CSP, and it obtained greater classification accuracy under the small training dataset. Therefore, SCCSP was less affected by the number of training datasets. This is very important for BCI application. Figure 7, 8 illustrated that the SCCSP algorithm obtained a greater separability between classes after classifier. The statistical results of λ indicated that SCCSP can improve the classification accuracy by improving the separability of classes. In BCI applications, there existed multi-class classification problem.
The algorithms by spatial projection applied multiple binaryclass classification to achieve multi-class classification, such as CSP. Thus, SCCSP can obey this way to classify multi-classes. In this way, one class can be seperated from other classes. Moreover, SCCSP applied SICA and spatial projection to obtain the spatial filter, and furter, the method may also be extended to other higher time resolution signal modalities analysis, such as fNIRS.
In conclusion, in this study, SCCSP has been introduced to the CSP family. This algorithm naturally integrates the relevant information and suppresses the influence of irrelevant information. The accuracy merits of SCCSP as supplemental to the broadband CSP filtering have been attentively validated on the public datasets of motor imagery EEG signals. The quantitative comparisons suggest superior discrimination and stable capability of the proposed method over the conventional CSP. Moreover, the test with varied training datasets shows excellent performance in small training datasets, and this is important in practical application. However, SCCSP spends longer time than CSP. This algorithm needed to be optimized. The SCCSP is affected by the µ-rhythm oscillation on the homolateral hemisphere. For improving the classification accuracy, the suppressed method of the homolateral hemisphere influence should be studied in further studies. In the future, we plan to study the SCCSP for multi class classification.