An Intelligence EEG Signal Recognition Method via Noise Insensitive TSK Fuzzy System Based on Interclass Competitive Learning

Epilepsy is an abnormal function disease of movement, consciousness, and nerve caused by abnormal discharge of brain neurons in the brain. EEG is currently a very important tool in the process of epilepsy research. In this paper, a novel noise-insensitive Takagi–Sugeno–Kang (TSK) fuzzy system based on interclass competitive learning is proposed for EEG signal recognition. First, a possibilistic clustering in Bayesian framework with interclass competitive learning called PCB-ICL is presented to determine antecedent parameters of fuzzy rules. Inherited by the possibilistic c-means clustering, PCB-ICL is noise insensitive. PCB-ICL learns cluster centers of different classes in a competitive relationship. The obtained clustering centers are attracted by the samples of the same class and also excluded by the samples of other classes and pushed away from the heterogeneous data. PCB-ICL uses the Metropolis–Hastings method to obtain the optimal clustering results in an alternating iterative strategy. Thus, the learned antecedent parameters have high interpretability. To further promote the noise insensitivity of rules, the asymmetric expectile term and Ho–Kashyap procedure are adopted to learn the consequent parameters of rules. Based on the above ideas, a TSK fuzzy system is proposed and is called PCB-ICL-TSK. Comprehensive experiments on real-world EEG data reveal that the proposed fuzzy system achieves the robust and effective performance for EEG signal recognition.


INTRODUCTION
Epilepsy occurs randomly and may occur multiple times in a day. In the case of epileptic seizures, the patients have a sudden physical convulsions and loss of consciousness, which bring great physical and psychological pain to patients (Ahmadlou and Adeli, 2011;Gummadavelli et al., 2018;Cury et al., 2019). Seizures will lead to brain cell death, affect brain function, and even threaten patients' lives in serious cases. The incidence of epilepsy is high, and the age range is very wide, including children, adolescents, and the elderly, but the incidence of children and adolescents is the highest. Both men and women are likely to have the disease, and men are more likely to have this disease than women. As an important clinical means of monitoring and diagnosing epilepsy, EEG provides a more rapid and stable low-cost and non-invasive technology in monitoring the brain activity of the cerebral cortex. It provides information that other physiological methods cannot provide. The specific waveforms such as spike, sharp, and complex wave can be reflected by EEG. Therefore, the prevention and treatment of epilepsy research for epilepsy patients is of great significance. In the process of diagnosis and treatment of epilepsy, EEG plays an irreplaceable role. Doctors usually judge the condition of patients by observing their EEG.
The traditional way to judge the EEG signal is not only inefficient, but also because of the difference of experts' subjective experience, the automatic detection of EEG signal is still one of the hot issues in biomedical research (Jiang et al., 2017a;Martinez-Vargas et al., 2017;Li et al., 2019). An automatic epilepsy detection method can help doctors improve the accuracy of epilepsy diagnosis and also greatly save time. The research of automatic epilepsy detection is of great value to the prevention, diagnosis, and treatment of epilepsy. At present, epilepsy can be detected by machine learning and data mining. Firstly, the effective feature information is extracted from EEG and preprocessed for data analysis; secondly, the preprocessed EEG data are sent to the classifier for analysis and detection of epileptic and non-epileptic EEG data. In the above implementation process, the key research is to design an effective prediction and discrimination method that can be applied to normal EEG signal and epileptic EEG signal. Many effective methods have been successfully applied to automatic epilepsy detection system, including extreme learning machine (ELM), artificial neural network, Bayesian linear discriminant analysis, support vector machine (SVM), and fuzzy system (Kabir and Zhang, 2016;Qi et al., 2017;Akhavan and Moradi, 2018;Truong et al., 2018;Hossain et al., 2019;Liu et al., 2019;Sreej and Samanta, 2019;Xia et al., 2020). The fuzzy system is a model constructed to deal with the thinking, analysis, reasoning, and decision-making processes in production and practice. It can directly translate natural language into computer language. Due to its ability to process uncertain and ambiguous information, it has a high degree of interpretability and strong learning ability (Juang et al., 2007;Gu et al., 2017a;Jiang et al., 2017b,c;Gu and Wang, 2018). However, the traditional fuzzy system has poor robustness and anti-interference ability, and the classification accuracy is not high in data noise scenarios. But in real life, the classification of noise data is widely used. For example, in actual application scenarios, due to differences in machine advices or scanning technology, such as different rotation angles and noise, the quality of medical images may vary greatly (Siuly and Li, 2015;Hussein et al., 2019;Razzak et al., 2019).
Based on key technology of fuzzy system modeling, this paper proposes a novel noise-insensitive Takagi-Sugeno-Kang (TSK) fuzzy system. How to determine the antecedent and consequent parameters is the key to modeling the noiseinsensitive fuzzy system (Takagi and Sugeno, 1985;Jiang et al., 2015). For the antecedent part of fuzzy rules, clustering is one kind of a commonly used strategy, such as fuzzy c-means (FCM) clustering (Bezdek et al., 1984), fuzzy (c + p) clustering (Leski, 2015), Bayesian fuzzy clustering (BFC) (Glenn et al., 2015), and possibilistic c-means (PCM) clustering (Krishnapuram and Keller, 1993). However, FCM, fuzzy (c + p), and BFC are sensitive to noise and will lead to unsatisfactory partition in noisy scenarios. PCM inherits the practicability and flexibility of fuzzy clustering and greatly enhances the clustering performance of data with noise or outliers. However, the unsupervised nature of PCM makes it unable to use the class label information of samples, which easily causes the insufficient fuzzy space partition, thus further affecting the learning of antecedent parameters of fuzzy rules. The principle of antecedent parameter learning using PCM clustering is shown in Figure 1A. PCM clustering is directly used on whole datasets or on samples in each class, and then the antecedent parameters are learned using the obtained clustering results. Then the data samples are simply divided into several clusters, without fully taking advantage of the geometry of data and the label information of samples. In this case, in the data overlapping regions, the distance between clustering centers may be too small or the centers may overlap.
In this paper, we first propose a noise-insensitive possibilistic clustering in Bayesian framework with interclass competitive learning called PCB-ICL. Inherited by PCB, PCB-ICL is noise insensitive; meanwhile, different classes of cluster centers will produce a competitive relationship during the learning process. That is, in the sample overlapping area, the clustering centers are attracted by the samples of the same class and also excluded by the samples of other classes and pushed away from the heterogeneous data. The principle of antecedent parameter learning using PCB-ICL clustering is shown in Figure 1B. PCB-ICL integrates the competitive learning mechanism of clustering centers among different classes in the Bayesian framework. PCB-ICL considers the structure information of samples in the clustering procedure and realizes the competition between clustering centers among different classes. We obtain the antecedent part of fuzzy rules by performing PCB-ICL alternatively on each class samples. Then, a Ho-Kashyap procedure (Leski, 2003) with an asymmetric expectile term (Huang et al., 2014a,b) is adopted to estimate the consequent parameters of fuzzy rules. Due to the statistical characteristics of the asymmetric expectile term, it is insensitive to noise; so the asymmetric expectile term is used to measure the misclassification error. Based on the above idea, the TSK fuzzy system called PCB-ICL-TSK is developed, which learns antecedent parameters by PCB-ICL clustering and consequent parameters by the Ho-Kashyap procedure with an asymmetric expectile term. We apply the proposed algorithm on the Bonn EEG dataset, and the experimental results on several noisy classification tasks demonstrate that PCB-ICL-TSK can achieve satisfactory performance in EEG signal classification. The novelty of our study is as follows. (1) Both the PCB-ICL and Ho-Kashyap procedure with an asymmetric expectile term are insensitive to noise; thus, the obtained antecedent and consequent parameters are noise insensitive. (2) With the Bayesian framework, the clustering results of PCB-ICL are globally optimal. In addition, the competitive relationship strategy between cluster centers enhances the interpretability of the antecedents of fuzzy rules.
(3) The experiments on real-word EEG datasets confirm the effectiveness of PCB-ICL-TSK.
The detailed chapters are arranged as follows. Section Backgrounds introduces the TSK fuzzy system and PCM clustering. Section Possibilistic Clustering in Bayesian With Interclass Competitive Learning explores PCB-ICL clustering. Section Noise-Insensitive TSK Fuzzy System via Interclass  Competitive Learning explores the noise-insensitive TSK fuzzy system PCB-ICL-TSK. Section Experiment is experiments on noisy EEG data. Section Conclusion is the conclusion.

BACKGROUNDS Dataset
The epileptic EEG in the experiment is the Bonn dataset from Bonn University, Germany (Tzallas et al., 2009). The Bonn EEG dataset consists of five groups of data, namely, A to E, shown in Figure 2. Each group of data contains 100 EEG signal segments of 23.6 s, which were selected from continuous single-channel EEG recordings. The EEG signals were recorded under different conditions with five patients and five healthy volunteers. The basic information of groups A-E is shown in Table 1.

TSK Fuzzy System
The most commonly used rule in the zero-order TSK fuzzy system can be represented by Rule R k : IF x 1 is A k,1 and x 2 is A k,2 and . . . and x d is A k,d , where x 1 , x 2 , . . . , x d are input variables, A k,i is a fuzzy subset, and K is the number of fuzzy rules. For an input vector x, the output of the corresponding TSK fuzzy system is represented by where the fuzzy membership µ k (x) and the normalized fuzzy For the sample x i , we can rewrite it by Generally, antecedent and consequent parameters of rules are determined separately. A popular way to estimate antecedent parameters is to use a certain fuzzy clustering method (Takagi and Sugeno, 1985;Gu et al., 2017b;Salgado et al., 2017). Then where the width parameter δ k,i can be obtained by where h is the scale parameter and u k,j is the fuzzy membership of the jth input sample x j belonging to the kth cluster. Then the learning of consequent parameters can be represented by Using the least square solution to minimize the squared loss, Equation (8) can be written by Using the Ho-Kashyap iterative method (Leski, 2003), p can be computed by where I is the identify matrix.

PCM Clustering
PCM clustering is a probability clustering based on FCM. Based on the framework of possibility theory, PCM not only takes into account the general criteria of clustering with the minimum distance within one class and the maximum distance between classes but also emphasizes the principle of the maximum membership value to avoid ordinary solution problems. The objective function of PCM is The closed solution of U and Y can be obtained by minimizing the objective function with respect to u nc and y c by.

Objective Function
A clustering method implements data partition with some certain degree of similarity. In the clustering process, the samples of one class will have a repulsive effect on the clustering center of other classes, especially in the overlapping regions of different classes of samples; the greater the overlap density, the greater the repulsive force. In these sample overlapping regions, clustering centers of different classes form the competitive learning relationship. On the one hand, the clustering centers are attracted by samples of this class; on the other hand, the clustering centers are excluded by different classes of samples and far away from the overlapping region. In this paper, this idea is embedded into PCM clustering. Based on the Bayesian framework, we propose the possibilistic clustering in Bayesian with interclass competitive learning.
Suppose a given binary classification dataset X = x n , l n N n=1 , in which X 1 = x n , l n N 1 n=1 and X 2 = x n , l n N n=N 1 +1 represent two class samples and l n ∈ {+1, −1} is the class label of the nth sample. Let the cluster number of one class samples be C 1 and the cluster centers of the other class Z be priorly known Z = z 1 ,z 2 , . . . , z c 2 T , where the cluster number is C 2 . We suppose data X follows the normal distribution, and each sample x i has an independent probability distribution. The maximum posterior estimation of data and parameters in X 1 is expressed by where Y = y 1 ,y 2 , . . . , y c 1 T is the unknown cluster center matrix of one class sample. By taking the logarithm of Equation (14), the objective function of PCB-ICL method can be obtained as From Equations (14) and (15), we can see that (1) the PCB-ICL method shows the competition relationship between clustering centers of different classes. Different from the traditional PCM clustering method, PCB-ICL not only considers the label information of samples but also considers the competition relationship between clustering centers, as shown in the first two items. On the premise that the clustering centers of the other class are priorly known, the clustering centers of the current class will inevitably have a competition relationship with these known clustering centers in the overlapping region.
(2) Due to simultaneously utilizing the global distribution structure and the discrimination information of the samples, the obtained antecedent part of fuzzy rules by PCB-ICL can realize the clarity of fuzzy space partition and enhance the interpretability of the fuzzy rules.

Parameter Learning
To obtain the optimal fuzzy partition matrix U, the PCB-ICL method uses the Metropolis-Hastings method (Chib and Greenberg, 1995;Elvira et al., 2017) to construct a Markov chain to make p(U|X 1 , Y) stable. The conditional distribution p(U|X 1 , Y) is proportional to the joint distribution p(X 1 , U, Y) when the sample and clustering center are known and also is proportional to the conditional distribution p (U|X 1 , Y). Therefore, we only need compute p(x n , u n |Y) of the sample x n : p(x n , u n |Y) = p(x n |u n , Y)p(u n |Y) Thus, the process of the ith iteration of the Markov chain is 1) Generate a new state u + n of u n with a uniform distribution as u + n ∼ Uniform(0, 1),∀n 2) The newly generated membership u + n is accepted by the probability a u as a u = min 1, p(x n , u + n |Y) p(x n , u n |Y) Then accepting a u as the current state with probability u n , where µ is a random number in [0, 1]. The distribution of the new state u + n obtained by sampling is independent of the current sample, and the state u + n /u n is independent, so a u does not need Hasting correction.
3) Compare p(x n ,u + n |Y * ) and p(x n , u * n |Y * ), where Y * and u * n are the optimal values of Y and u n . If, p(x n ,u + n |Y * ) >p(x n , u * n |Y * ) u + n is replaced by u * n . When the matrix U is fixed, we use Metropolis-Hastings to sample the conditional distribution p(Y|X, U). In this case, p(Y|X, U) is proportional to the joint distribution p(X, U, Y). We estimate y c by using the Gaussian distribution as where y + c centers on the current value y c . σ is a positive number and is used to control the compactness of cluster centers. In the experiment, we empirically set σ to 10.
For the newly generated y + c , it is independent of other clustering centers. Then the conditional distribution p(X, y c |U) is represented by Similarly, the newly generated membership y + c is accepted by the probability a y as a y = min 1, p(X, y + c |U) p(X, y c |U) Since the Gaussian distribution is symmetric, a y does not need Hasting correction. Finally, we compute p(X, U * , Y * ) using Equation (15) and compare it with the current p(X, U, Y). If p(X, U, Y) > p(X, U * , Y * ), the {U, Y} is replaced by {U * , Y * }.
Based on the above analysis, we give the procedure of the PCB-ICL method in Algorithm 1.

Algorithm 1 | PCB-ICL method.
Input: Dataset X 1 of one class, the number of clustering C, priorly known clustering center matrix Z of the other class; Output: Fuzzy partition matrix U* and clustering center matrix Y*.

NOISE-INSENSITIVE TSK FUZZY SYSTEM VIA INTERCLASS COMPETITIVE LEARNING Antecedent Parameter Learning in PCB-ICL-TSK
In this section, we compute the antecedent parameters in PCB-ICL-TSK. The premise of PCB-ICL clustering in Algorithm 1 is that the clustering centers of other class are priorly known, which is obviously not feasible in practical application. To perform the fuzzy partition on the whole data set, we take the strategy of an alternating cycle to perform Algorithm 1 on different classes. In this case, the clustering results of one class influence the ones of the other class. Taking binary classification as an example, we perform Algorithm 1 on positive class X 1 and negative class X 2 alternately. The detailed fuzzy partition of the whole data is shown in Algorithm 2.
The numbers of clustering in two classes are C 1 and C 2 , and the cluster centers in two classes are Y 1 and Y 2 , respectively. After applying Algorithm 2 on the whole data, the center matrix Y can be described by Y * = [Y (1) * ; Y (2) * ].

Consequent Parameter Learning in PCB-ICL-TSK
In this section, we compute the noise-insensitive consequent parameters in PCB-ICL-TSK. As discussed before, using the obtained the antecedent parameters, the dataset the consequent vector p * = [p 1 0 , p 2 0 , ..., p (C 1 +C 2 ) 0 , w] T can be Algorithm 2 | Fuzzy partition on the whole data.
Step 1 Initiate u n ) ∼ Uniform(0, 1) in two classes; Step 2 Initiate in two classes; Step 3 Set u Step 4 Perform Algorithm 1 on X 1 ; Step 5 Perform Algorithm 1 on X 2 ; where the vector p 0 = [p 1 0 , p 2 0 , ..., p (C 1 +C 2 ) 0 ] T and w is the decision threshold. If we multiply Equation (23) by the class label, Equation (23) is represented as N). Then, the vector p * can be computed by In particular, ε 0 = 1 leads to the classical SVM. For simplicity, we set ε 0 = 1, and Equation (24) can be written as l i (p * ) T d(x i ) ≥ 1. Thus, Equation (24) can be written as Denote the matrix D = [l 1 d(x 1 ) T , l 2 d(x 2 ) T , . . . , l N d(x N ) T ] T and the error vector e = D * p * -1. Equation (25) can be rewritten as where the matrix H = (λ/N)diag(h 1 , h 2 , . . . , h N ), with h i = 0 for error e i ≥ 0 and 1 otherwise. However, the misclassification error in Equation (24) is noise sensitive. To further improve the robustness of the TSK fuzzy system, we use the asymmetric expectile term, which is noise insensitive, especially to noise around the decision boundary. The weight h i of the ith sample can be expressed by where h i is the q (lower) expectile parameter. Obviously, when q = 0, the loss term obtained in Equation (27) is equal to the Algorithm 3 | Learning algorithm for consequent parameters.

Input:
The dataset X; the number of clusters (C 1 + C 2 ); the cluster centers Y 1 , Y 2 and the membership matrix U 1 , U 2 ; the expectile parameter q; and the regularization parameter τ ; Output: Consequent parameters p 0 .
Step 1 Run Algorithm 2 to obtain the antecedent parameters; Step 2 Compute the membership function d( T by using Equations (5)-(7); k = 0; Do Step 3 Obtain the parameters (p*) (k) using Equation (29); Step 4 Compute the parameter e (k) using e (k) = D(p*) (k) -1; Step 5 Compute the parameter H (k+1) using Equation (27); hinge loss, and when q = 0.5, the loss term is equal to the l 2 loss in Huang et al. (2014a,b). At the same time, considering the regularization term, Equation (26) can be rewritten as where τ is the regularization parameter. p * (k) , H (k) , and e (k) are the kth iteration of p * , H, and e, respectively. The condition for optimality of Equation (28) in the kth iteration is obtained by setting dJ/dp * = 0: whereĨ is the identity matrix with the last element on the main diagonal set to 0. The consequent parameter learning in IB-TSK-FC on dataset X is shown in Algorithm 3.

EXPERIMENT Experimental Settings
The real-world EEG signals have characters of high dimensionality and instability. Feature extraction is a necessary stage before classification for EEG signal recognition. In general, time domain and frequency domain feature extractions are two types of feature extraction methods (Wen and Zhang, 2017). In our experiments, we extract EEG features using kernel principal component analysis (KPCA) and short-time Fourier transform (STFT) (Blanco et al., 1997). The former is the time domain feature extraction, and the latter is the frequency domain feature extraction. In the experiment, we design eight classification tasks, namely, four binary classification and four three-class classification tasks, as shown in Table 2. We corrupt the original   datasets with different amounts of random noises at 5, 10, and 15% noise levels.
The experimental environment in this study is a computer with Intel Core i3-3317U 3.40-GHz CPU and 8-GB RAM. To validate the performance of MST-TSK, we compare three fuzzy systems (FS-FCSVM;et al., 2007, ε-margin-TSK-FS;Leski, 2005, and IB-TSK-FC;Gu et al., 2017b) and two robust classification methods (CS-SVM; Iranmehr et al., 2019 and FRSVM-ANCH;Gu et al., 2019). The Gaussian kernel is used for two SVM methods. The parameter settings for all methods are listed in Table 3. All parameters are obtained by a 5-fold crossvalidation strategy.

Classification Performance Comparison
In this section, eight EEG classification tasks are used to verify the classification performance of PCB-ICL-TSK. Tables 4, 5 show the experimental results of six classification algorithms using STFT and KPCA feature extraction methods at the 5% noise level. Tables 6, 7 show the experimental results of six The bold values indicate the best classification performance in the tasks. The bold values indicate the best classification performance in the tasks. The bold values indicate the best classification performance in the tasks. The bold values indicate the best classification performance in the tasks.
Frontiers in Neuroscience | www.frontiersin.org The bold values indicate the best classification performance in the tasks. The bold values indicate the best classification performance in the tasks.
FIGURE 3 | The rules obtained by four fuzzy systems on the 5% noise level using KPCA features.
FIGURE 4 | The rules obtained by four fuzzy systems on the 15% noise level using KPCA features.
FIGURE 5 | The rules obtained by four fuzzy systems on the 5% noise level using STFT features.
FIGURE 6 | The rules obtained by four fuzzy systems on the 15% noise level using STFT features. classification methods using STFT and KPCA feature extraction methods at the 10% noise level. Tables 8, 9 show the experimental results of six classification methods using STFT and KPCA feature extraction methods at the 15% noise level. From the experimental results, it can be seen that the noise data seriously affect the classification performance of the method. During the learning process, considering the noise of the data is helpful to promote the classification performance. Therefore, the performances of FS-FCSVM, ε-margin-TSK-FS, and IB-TSK-FC are poor. CS-SVM, FRSVM-ANCH, and PCB-ICL-TSK are not sensitive to noise, and they can achieve good classification results. In particular, PCB-ICL-TSK shows excellent classification performance in different levels of noise occasions, and it reflects strong robustness. Since PCB-ICL-TSK uses the PCB-ICL and Ho-Kashyap procedure with an asymmetric expectile term to compute antecedent and consequent parameters of fuzzy rules, it is noise insensitive. In addition, in the Bayesian framework, PCB-ICL obtains global optimal clustering results, and the strategy of competitive relationship of clustering centers can enhance the interpretability of the antecedents of fuzzy rules.

Interpretability Comparison
In this section, we compare the number of fuzzy rules of four fuzzy systems in Task 8. Figures 3, 4 show the number of fuzzy rules on the 5 and 15% noise levels for four fuzzy systems using KPCA features. Figures 5, 6 show the number of fuzzy rules on the 5 and 15% noise levels for four fuzzy systems using STFT features. From the results in Figures 3-6, compared with the three fuzzy systems, the number of fuzzy rules obtained by PCB-ICL-TSK is the least in all EEG classification tasks. It is known that for fuzzy systems, the interpretability of fuzzy rules is related to the number of fuzzy rules and the definition of fuzzy subsets. The fuzzy membership function obtained by PCB-ICL on Task 1 at the 5% noise level using KPCA features is shown in Figure 7.
Because PCB-ICL clustering considers the influence of clustering centers of different classes in the process of clustering, that is, the competition relationship between different classes of clustering centers, PCB-ICL clustering can obtain clustering centers with a large interval, which guarantees the partition clarity of feature space and the classification accuracy of the obtained fuzzy system and the interpretation of rules.

CONCLUSION
The noise-insensitive PCB-ICL-TSK fuzzy system is proposed in this paper. In the learning of rule antecedent parameters, the proposed noise-insensitive PCB-ICL clustering based on the Bayesian probability model is used. PCB-ICL clustering considers the repulsion between different clustering centers, which can ensure the interpretability of the rule antecedent. PCB-ICL can learn the global optimal solution of clustering results by using the Markov model. PCB-ICL-TSK learns consequent parameters using the Ho-Kashyap procedure with an asymmetric expectile term. Thus, it not only has strong noise resistance but also has high classification performance. The experimental results of a real EEG dataset show that PCB-ICL-TSK has achieved satisfactory results in classification performance and high interpretability. Our future work is to further improve its practicability when the sample dimension is large.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: The dataset analyzed for this study can be found in the Department of Epileptology University of Bonn [http://epileptologie-bonn.de/cms/upload/workgroup/ lehnertz/eegdata.html].