Noise Robustness Low-Rank Learning Algorithm for Electroencephalogram Signal Classification

Electroencephalogram (EEG) is often used in clinical epilepsy treatment to monitor electrical signal changes in the brain of patients with epilepsy. With the development of signal processing and artificial intelligence technology, artificial intelligence classification method plays an important role in the automatic recognition of epilepsy EEG signals. However, traditional classifiers are easily affected by impurities and noise in epileptic EEG signals. To solve this problem, this paper develops a noise robustness low-rank learning (NRLRL) algorithm for EEG signal classification. NRLRL establishes a low-rank subspace to connect the original data space and label space. Making full use of supervision information, it considers the local information preservation of samples to ensure the low-rank representation of within-class compactness and between-classes dispersion. The asymmetric least squares support vector machine (aLS-SVM) is embedded into the objective function of NRLRL. The aLS-SVM finds the maximum quantile distance between the two classes of samples based on the pinball loss function, which further improves the noise robustness of the model. Several classification experiments with different noise intensity are designed on the Bonn data set, and the experiment results verify the effectiveness of the NRLRL algorithm.


INTRODUCTION
Brain computer interface (BCI) is a system that collects the signals from the brain to communicate with computers or other devices (Gummadavelli et al., 2018;Jiang et al., 2020). As an efficient way for the human brain to directly communicate with peripheral devices, the BCI does not need to rely on the peripheral nervous system and muscles. Electroencephalogram (EEG) signals, as a biomarker, play an important role in BCI. EEG is often used in clinical diagnosis to determine the presence and type of epilepsy (Fahimi et al., 2019;Jiang et al., 2019). The epileptic seizure process has several different periods: interictal, pre-seizure, and seizure. The waveform, frequency, and signal characteristics of different stages are different in EEG. Based on the analysis of the characteristics of epilepsy EEG, many studies can generally be divided into two directions: epilepsy detection and epilepsy prediction Gu et al., 2021). The epilepsy detection algorithm uses signal processing, machine learning, and deep learning to extract signal features, and distinguishes the EEG signals between the interictal period and the seizure period. The epilepsy prediction algorithm distinguishes the EEG signals in the pre-seizure period and the seizure period.
The prediction task is more difficult than the detection task. First of all, there is no uniform definition of epilepsy prediction and internal standards in the industry. Secondly, compared with the EEG signal in the seizure period, the signal pattern of the EEG signals in the pre-seizure period and the EEG signals in the intermittent period are more similar, so the algorithm is required to be more robust.
Both epilepsy detection and epilepsy prediction are essentially classification tasks in machine learning . Several studies focus on the classification of EEG information, which involve both epilepsy detection tasks and epilepsy prediction tasks. Zhou et al. (2018) represented the epilepsy EEG signal into a two-dimensional image, and then they constructed a convolutional neural network to automatically learn the transformed image. This method borrows the idea of image processing to analyze the EEG signals and broaden the method of signal processing. Birjandtalab et al. (2017) performed nonlinear dimensionality reduction on EEG signals after extracting time-frequency features. This feature processing method can reflect the non-linear relationship of the data in the process of low-dimensional mapping. Ramakrishnan and Muthanantha (2018) computed the approximate entropy value, the maximum Lyapunov component, and the correlation coefficient dimension on different sub-bands of epilepsy EEG signals, and introduced fuzzy rules to fuzzify the features. The authors believe that fuzzy rules are the natural choice of using human professional knowledge to build machine learning systems, which is closely related to people's way of thinking. Wang et al. (2017) explored multiple bands of the EEG signal by considering the maximum and standard deviation characteristics of each band. Then they constructed the feature vector of the EEG and used a one-to-one self-organization strategy to create a high high-precision epilepsy detection system. Sun et al. (2019) intercepted and analyzed the pre-seizure data, and they used recurrent autoencoders on multivariate signals to extracted EEG features. Liu et al. (2019) transformed the EEG signal into spectral data through the combination of dimensionality reduction and short-time Fourier transform. Then, the authors constructed a shallow convolutional neural network (CNN) network to automatically learn data features. Yu et al. (2020) used the local mean decomposition (LMD) method to obtain the feature matrix of the EEG signals, and they used a CNN model to implement feature extraction and combined Bayesian linear discriminant analysis to obtain the prediction result.
In supervised learning, the support vector machine (SVM) represented by least squares regression (LSR) is a simple and effective method. The core idea of LSR is to learn the non-linear projection from the original data to the feature space, and the obtained projection vector of the original data is also used as the data representation in the label space. For example, discriminative LSR method include multiclass classification (Xiang et al., 2012), groupwise retargeted LSR method (Ling and Geng, 2019), regularized label relaxation linear regression (Fang et al., 2018), double relaxed regression for classification (Han et al., 2019), and so on. For epilepsy data, the scalp EEG data will have more impurity signals and noise signals. Moreover, dimensional explosion and information redundancy problems are common in EEG signals. Learning a discriminatively compact data representation is a very critical problem in pattern recognition. At present, there are many methods based on subspace learning and least squares classifier to learn good classifiers. For example, to combine projection learning with the task of exploring label information, Meng et al. (2020) proposed a constrained discriminative projection learning for joint optimization of subspace learning and classification problems, which used low-rank constraints to learn robust subspaces to connect the original visual features and target output. Subspace learning essentially tries to find a suitable low-dimensional space in which the discriminative representation of the original features is preserved as much as possible. In recent years, low-rank learning has achieved relatively good results in matrix analysis, data recovery, and data denoising. At the same time, low-rank representation is an effective means to describe the structure of high-dimensional data, and it is a generalized form of sparsity in matrix space. That is, low-rank representation can describe the lowdimensional subspace structure of high-dimensional data, thus its component in the subspace becomes the most important factor in characterizing the data. In addition, low-rank representation effectively introduces low-rank constraints into the data matrix, which can help to construct discriminative feature subspaces and eliminate outliers. Inspired by this idea, the noise robustness lowrank learning (NRLRL) algorithm is proposed for EEG signal classification. NRLRL learns a low-rank subspace that connects the original data space and the label space. It fully considers the correlation information and local structure of samples, and it guarantees the minimum rank of the coefficient matrix constructed of data under its self-expression. By integrating the multi-class asymmetric least squares SVM classifier with lowrank representation, NRLRL is insensitive to noise and outliers. The experiments performed on noisy EEG signals are shown that our algorithm is noise robust. NRLRL has several advantages as follows: (1) since the low-rank representation follows the minimum rank criterion, NRLRL is robust when reconstructing the original data with noise and outliers. (2) By full use of supervised information and pinball loss function, an asymmetric least square SVM is jointly learned into our objection function, so that NRLRL explores a robust classifier in the framework of low-rank learning. (3) Local constraints based on low-rank representations are used based on supervision information. The criteria of low-rank representations for minimum withinclass and maximum between-classes are adopted to capture the discriminative structure of the data.

Low-Rank Representation
Give a set of data samples X = [x 1 , ..., x n ], each sample x i ∈ R d in X can be represented as a linear combination of atoms from a dictionary A: where C = [c 1 , ..., c n ] ∈ R n×m is the coefficient matrix rank representation. As a common practice in low-rank learning, the dictionary A is set to X, i.e., Eq.
(2) uses the data set itself to represent the data, which is called the self-expression of the data. Each data sample in data set X can be represented by: By minimizing the rank of the coefficient matrix C, Eq.
(1) can be written by: min whererank(C) is the rank function of C.
Considering the existence of noise and outliers in the data sample, the structure of the original data X is taken as two parts: one is the linear combination of the dictionary X and a low-rank coefficient matrix C, and the second part is noise (error) matrix, i.e., Then the low-rating representation can be defined as follows: where ||·|| 2,0 means the 2,0 -norm operator. µ is the trade-off parameter. In Eq. (6), the term ||E|| 2,0 encourages the sparseness of the error components. The low-rank optimization problem of Eq. (6) is a non-convex NP-hard problem. To find its unique optimal solution, it is necessary to perform convex relaxation of Eq. (6). The kernel norm is the best convex approximation of the rank function on the unit sphere in the matrix spectral norm (Candès and Recht, 2009). Therefore, the convex kernel norm can be used to approximate the non-convex rank function, and the 2,0 norm can be relaxed to its 2,1 norm (Raghunandan et al., 2010). Then Eq. (6) can be written as the following convex optimization problem min where ||·|| * means the nuclear norm operator, and||·|| 2,1 means the 2,1 -norm operator.

Asymmetric Least Squares Support Vector Machine
The loss function in the least squares SVM (LS-SVM) pays attention to both the correctly classified and incorrectly classified samples. It minimizes the squared error of the classifier as follows, In fact, the above loss function is noise sensitive, especially the noises around the separation hyperplane. Many extensions of least squares loss function have been proposed to solve this problem, such as iteratively reweighted least square (Leski, 2015) and asymmetric square function (Leski, 2015), asymmetric squared loss (Huang et al., 2014). Using the statistical property to lower quantile value, the asymmetric squared loss is defined as: where w and b are the hyperplane parameter and bias parameter of SVM classifier, respectively. p is the lower quantile value parameter. The aLS-SVM uses the expectile distance and maximizes the expectile distance between different classes. The aLS-SVM has the following optimization problem: where α is the regularization parameter. This optimization problem can be solved by quadratic programming method.

NOISE ROBUSTNESS LOW-RANK LEARNING ALGORITHM
The Noise Robustness Low-Rank Learning Model Given a set of data points X = [x 1 , ..., x n ] and their labels Y = [y 1 , ..., y n ] are distributed in K classes. y k = [y 1,k , y 2,k , ..., y n,k ] is the class label vector of n training samples associated with the k-th class. Considering the influence of noise or outliers, the main goal of our algorithm is to find the lowest rank representation C and the best classifier based on C. First, to increase discrimination capability, the local preservation with label embedding is incorporated into the learning process. Different from the traditional local preservation term in low-rank learning, the label information is embedded into the k-nearest neighborhood relationships. For sample x i , its low-rank is expressed as c i . Without considering the label information of the sample, if x j is in the k-nearest neighbor of x i , their corresponding low-rank representations c j and c i should be closer to each other. Obviously, this strategy is not suitable for classification tasks. Based on the basic classification principles of within-class compactness and between-classes separation, the label information is introduced into the k-nearest neighborhood relationships. The within-class matrix B within and between-classes matrix B between are accordingly defined, and their elements can be defined as, where N(x j ) returns the k-nearest neighbors of x j . In the NRLRL, the original data is projected into a lowdimensional subspace by low-rank representation. NRLRL shows the similarity within the class and the difference between classes of the data. To achieve this goal, the label embedded local preservation term is defined as, where L = L between − L within , L between = B between − B between , L within = B within − B within . B between and B within are diagonal matrices, and their elements are B between,ii = n j=1 B between,ij , B within,ii = n j=1 B within,ij , separately. Tr(·) is the trace operator. The label embedded local preservation term is to ensure that if the nearest neighbors x i and x j are from the same class, their low-rank codes c i and c j are also close to each other. At the same time, if the nearest neighbors x i and x j are from different classes, their low-rank codes c i and c j are separated as much as possible. In such a low-rank learning stage, the ideal non-linear local structure of EEG data is preserved.
To promote the discriminative ability of low-rank representation vectors, a multi-class aLS-SVM classification term is embedded into the NRLRL algorithm. The multi-class aLS-SVM classification term L(C)includes two parts where l(c i , y k,i , w k , b k ) is the loss function associated with the k-th aLS-SVM. The squared pinball loss in NRLRL can be written as where µ k,i = y k,i (w T k c i + b k ) − 1. Bedding the local preservation with label embedding term and multi-class aLS-SVM classification term into the Eq. (7), the objective function of NRLRL can be written as min C,E,W,b ||C|| * + λ ||E|| 2,1 + γL(C,W,b) + ηTr(CLC T ), where λ,γ, and η are regularization parameters. The loss function term is decomposed into the sum of the loss term of each sample, and it can be seen that the contribution of each data sample to the objective function is linearly cumulative. L(C,W,b) includes a class-by-class loss function term l(c i , y k,i , w k , b k )on the low-rank representation, so that the obtained lowest-rank representations are highly correlated within the class. EEG signals belonging to the same class usually contain the common discriminative features. The lowest-rank representation obtained by Eq. (16) has the characteristics of strong within-class correlation and between-class difference for classification tasks.
To reduce the time costs, the Frobenius norm is used to replace the nuclear norm, the objective function of NRLRL can be rewritten as: For simplicity of expression, combining the two terms ||C|| 2 F and Tr(C T LC), Eq. (17) can be written as: From Eq. (18), we can see that the NRLRL algorithm consists of three sub-problems, namely the parameters of C, E, and aLS-SVM classifier. These three sub-problems can be solved alternately until the NRLRL algorithm converges. We use the alternating direction multipliers method (ADMM) (Luo et al., 2017) to solve Eq. (18). The augmented Lagrangian function corresponding to Eq. (18) can be written as: min C,E,W,b λ ||E|| 2,1 + γL(C,W,b) + Tr(C(ηL + I)C T ) where θ and δ are Lagrange multipliers, and µ is the penalty parameter.

Optimization of the Objective Function
According to the ADMM algorithm, the parameters in Eq.
(19) can be updated alternately, that is, when one parameter is updated, other parameters are fixed until the NRLRL algorithm converges.
(1) Update C by fixing E, w k , and b k . Eq. (19) is converted to the following problem: Frontiers in Neuroscience | www.frontiersin.org Let the derivative of Eq. (19) with respect to c i be zero, the solution of c i is: (2) Update E by fixing C, w k , and b k . Using the same calculation and reduction strategy by Liu et al. (2013). Eq. (19) is converted to the following problem: Then the solution of E can be obtained by: where θ i is the ith column vector of the matrix θ .
(3) Update w k and b k by fixing E and C. Eq. (19) is converted to the following problem: Eq. (24) is a multi-class aLS-SVM problem, and the optimal parameters w k and b k can be obtained by the aLS-SVM algorithm.
The training procedure of NRLRL is summarized in Algorithm 1.
ALGORITHM 1 | The training procedure of NRLRL is summarized.
Algorithm NRLRL: noise robustness low-rank learning algorithm Input: Training samples X, label matrix Y.

EXPERIMENT Datasets and Experimental Settings
This study used EEG signals are from Bonn University. The data set consists of five subsets (groups A to E), each of which consists of 100 EEG segments with a single channel duration of 23.6 s and 4,097 samples. The fragments in groups A-B were taken from five healthy subjects, and the fragments in groups {C, D, E} were taken from patients with epilepsy. The groups C and D recorded the signal during the intermittent period of epileptic seizures. The group E recorded the signal during the seizure. The signals of the five groups of EEG data are shown in Figure 1. In the experiment, the 4,097 data points were divided into three data blocks to obtain the research samples, that is, a data block is a sample, representing the EEG information in about 8 s. Therefore, the sample size of this paper is 3 × 100 = 300 in each group, and each sample has 1,365 features of sampling points. We design two types of classification tasks on the Bonn dataset. One is the binary classification task: non-epileptic condition (sets {A, B, C, D, E}) and epileptic condition (set E). The other is the three classes of classification task: normal (sets {A, B}), interictal (sets {C, D}), and ictal (set E).
Following the method of references (Huang et al., 2014;Gu et al., 2019Gu et al., , 2020, 20 and 50% samples are randomly selected and common Gaussian white noise is added. To test the sensitivity of the classifier to noise intensity, the intensity of Gaussian white noise is divided into three types: the mean value is 0, and the variance is set to 5, 10, and 15% of the sample features, respectively. For example, the noise (20%, 10%) indicates that 20% of the samples in the Bonn dataset contain Gaussian white noise, and the variance of the noise is 10% of the sample features.

Classification Result Comparison
First, we perform experiments on the binary classification task. We compare all algorithms in indexes of specificity, sensitivity, and accuracy on the noisy Bonn dataset. The experimental results of the binary classification task are shown in Tables 1-3. From the experimental results, it can be seen that (1) with the increase of noise intensity, the specificity, sensitivity, and accuracy of all algorithms show a decline in varying degrees. It can be seen that the characteristic noise of samples will seriously affect the classification effect of the classifier. Especially the DLSR and LC-KSVD algorithms do not consider the impact of the noise sample interference on the classification surface. As the noise intensity increases, the classification result decreases rapidly.
(2) SRRS, LRSD, aLS-SVM, LRDLSR, and NRLRL algorithms are all noise-insensitive classification algorithms, therefore the classification results are significantly better than conventional classification algorithms. The proposed NRLRL algorithm achieves the best results in classification performance. The NRLRL algorithm removes the influence of noise on the sample in the lowest rank representation, and it uses the pinball loss function to obtain a noise-insensitive classification classifier by maximizing the distance between the two classes of quantile distances. In addition, the NRLRL algorithm can mine the geometric structure of samples in low-dimensional space by lowrank learning, and fully considers the correlation information and subspace structure between samples. Therefore, the withinclass similarity and between-class differences of the data are more prominent, which makes the NRLRL algorithm obtain good classification performance in the presence of noise.
Then, we perform experiments on three class classification task, i.e., classification of EEG data in normal, interictal and epileptic periods. Similarly to the above experiment procedure, we compare all algorithms in indexes of specificity, sensitivity, and accuracy on the noisy Bonn dataset. The experimental results of three classification task are shown in Tables 4-6. From these results, it can be observed that since the complexity of the three class classification is higher than that of the binary class classification, the results of three class classification are lower than those of binary class classification. The proposed NRLRL algorithm achieves the best results of specificity, sensitivity, and accuracy. The bold values mean the best values in comparison experiments.

Parameter Analysis
Here we discuss the key parameters in the NRLRL algorithm on binary classification task noise (20%, 10%) and three class classification noise (50%, 10%). The k-nearest parameter k-nearest is an important parameter in NRLRL. It determines the neighbor relationship between samples. The k-nearest parameter is set from {3, . . . , 11}. The The bold values mean the best values in comparison experiments.  The bold values mean the best values in comparison experiments. The bold values mean the best values in comparison experiments. The bold values mean the best values in comparison experiments. highest. The appropriate value of k can reflect the local structural information of the sample to the greatest extent. From the results in Figure 2, we can set k = 7 in the NRLRL algorithm for noisy Bonn dataset. Another important parameter is m, which is the size of the matrix C. The classification accuracies of NRLRL with different m are shown in Figure 3. The parameter m controls the data structure of the low-rank space. When m is too small, the low-rank representation related to the data is not enough to model its structure in the low-rank space. When m is too large, the redundant information will produce errors of low-rank representation. From the results in Figure 3, we can set m = 240.
The pinball loss parameter p is the important parameter in aLS-SVM classifier in NRLRL. The value range of the pinball loss parameter is {0.5, 0.83, 0.95, 0.99}. The classification accuracies of NRLRL with different p are shown in Figure 4. With different values of p, the NRLRL algorithm has achieved high classification accuracy. It shows that the NRLRL algorithm is not sensitive to the p parameter, so the value of p can be fixed to 0.95 in the experiment.

CONCLUSION
In this study, the NRLRL algorithm is proposed for EEG signal classification. Different from noise-insensitive SVM learns the classification hyperplane in the original space or kernel space, NRLRL learns a low-rank subspace as the transformation from the original data space to the label space, to improve the overall classification effect. By introducing the criteria of low-rank representations for minimum within-class and maximum between-classes, the discriminative ability of the model has been greatly improved. The pinball loss function is also helpful to improve the noise insensitivity of the model. The effectiveness of the proposed algorithm is verified on the noisy Bonn EEG dataset. Since our algorithm directly uses the EEG sample point as the input features, we will consider applying various feature extraction methods to the NRLRL algorithm in the next stage. The seizure data is often insufficient in epilepsy detection or prediction tasks. To obtain an effective algorithm model, the down-sampling strategy is often performed to make the class balanced. This strategy will cause the loss of signal data. Therefore, research on appropriate imbalanced data classification methods is the focus of the next stage.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: https://github.com/benfulcher/hctsaTutorial_ BonnEEG.