Identification of Epileptic EEG Signals Through TSK Transfer Learning Fuzzy System

We propose a new model to identify epilepsy EEG signals. Some existing intelligent recognition technologies require that the training set and test set have the same distribution when recognizing EEG signals, some only consider reducing the marginal distribution distance of the data while ignoring the intra-class information of data, and some lack of interpretability. To address these deficiencies, we construct a TSK transfer learning fuzzy system (TSK-TL) based on the easy-to-interpret TSK fuzzy system the transfer learning method. The proposed model is interpretable. By using the information contained in the source domain and target domains more effectively, the requirements for data distribution are further relaxed. It realizes the identification of epilepsy EEG signals in data drift scene. The experimental results show that compared with the existing algorithms, TSK-TL has better performance in EEG recognition of epilepsy.


INTRODUCTION
Epilepsy is a disease caused by the sudden discharge of cerebral neurons. EEG technology (Suk et al., 2018) can monitor the changes of brain electrical signals, so we often use EEG intelligent recognition technology to detect epilepsy (Litt et al., 2001;Iasemidis et al., 2003;Dorai and Ponnambalam, 2010). Nowadays, many machines learn algorithms Zhang et al., 2021) have been used to recognize epileptic signals, such as Decision Tree , nearest neighbor (KNN) (Iscan et al., 2011), Naive Bayes Algorithm (NB; Iscan et al., 2011), support vector machines (SVM; Yang et al., 2014), and fuzzy system (Aarabi et al., 2009;Rabbi and Fazel-Rezai, 2012;Deng et al., 2014aDeng et al., ,b, 2018Jiang et al., 2015). It has been proved that these algorithms can detect epilepsy faster and more accurately than doctors. However, as shown in Figure 1, only when the training set and test set obey the similar distribution can they show good classification performance. However, in most cases, as shown in Figure 2, the distribution characteristics of EEG data are not exactly the same. In order to make full use of their similar information, some researchers have proposed to use transfer learning algorithm, such as LMPROJ (Yang et al., 2014) and STL , applying the old knowledge we have gained to new fields. Different from traditional machine learning, which acquires knowledge from data and applies it to new problems, transfer learning focuses on transferring the learned knowledge to new problems. Although the problem of different data distribution has been solved to a certain extent, these transfer learning algorithms only consider reducing the marginal distribution probability or conditional distribution probability (Deng et al., 2018) of data, without comprehensive balance, and these algorithms lack of interpretability.
To solve these problems, we propose a new method of EEG recognition based on transfer learning and a fuzzy system. The traditional method has a single model structure, so it cannot achieve good results in the face of complex scenes. Different from the traditional method, we pay attention to how to make full use of the previously marked data while ensuring the accuracy of the model on the new task. We not only minimize the marginal distribution or conditional distribution probability, but also combine them to minimize the joint probability distribution (Deng et al., 2018), and reach the best balance between marginal distribution and conditional distribution. In terms of ensuring interpretability, we use the TSK fuzzy system. Its IF-THEN rules can help us understand the rules of model operation more clearly. It has been widely used in data flow modeling, mining tasks, metacognitive learning, and multi-task learning. In order to realize this system, a TSK fuzzy system construction method based on transfer learning (TSK-TL) was developed. The experimental results show that compared with the existing algorithms, TSK-TL has better performance in EEG recognition of epilepsy.
Our contributions are mainly reflected on: (1) The introduction of transfer learning technology (Wang and Mahadevan, 2011;Quanz et al., 2012;Xiao and Guo, 2012), the proposed model in ensuring the accuracy of recognition At the same time, it has higher interpretability; (2) It has stronger robustness and can handle more complex data scenes; (3) It realizes the more accurate identification of epileptic EEG signals in data drift scenarios.
The rest of the manuscript is organized as follows. In the section "Backgrounds, "we briefly introduced the EEG data set, the classical TSK model and the related contents of transfer learning. In section "Identification of Epileptic EEG Signals Through TSK Transfer Learning fuzzysystem, "we first introduced the framework based on transfer learning, and then proposed the objective function of the TSK-TL. In section "Experimental Process and Result Analysis, " we introduce the details of our experiment to test the performance of TSK-TL. The conclusion is given in the last section.

BACKGROUNDS
This section introduces the data sets and their processing methods used in the research, the classical TSK fuzzy system and the related content of transfer learning.

Epilepsy EEG Signal Dataset
The original epileptic EEG data set used in this study is divided into five groups, i.e., Group A to Group E, each group contains 100 single-channel signal segments, and the sampling rate of all samples is adjusted to 173.6 Hz. Among them, the data from healthy people are divided into groups A and B. The difference is that the eyes of group A are opened and group B is closed. The data of groups C, D, and E are obtained from volunteers with epilepsy in different states (Li, 2021). Figure 3 shows the five groups of original epilepsy EEG signals. Table 1 presents these five groups in detail.
The distribution law of EEG signals changes with time. Its amplitude is very small, and it is easy to be affected by other human biological currents such as ECG, EOG, and EMG. At the same time, it has strong randomness, and the noise in the signal is very complicated. Therefore, the experimental results obtained by using the original EEG signal directly are not ideal. According to previous work (Jiang et al., 2017;Tsujikawal et al., 2018), WPD, STFT, and KPCA are three typical feature extraction methods to process epileptic EEG signals (Blanco et al., 1997;Zhang et al., 2000;Srinivasan et al., 2005;Vivaldi and Bassi, 2006;Tzallas et al., 2009;Tang and Durand, 2012;Teng et al., 2017). Figure 4 shows the sample of group A processed by the three feature extraction methods.

Classical TSK Fuzzy System
Because of its unique interpretability, fuzzy systems has been widely used in modeling and intelligent control. In addition, the output of TSK fuzzy system is more concise. The training process can be transformed into a linear regression problem or a quadratic programming problem, which makes the training process more efficient (Deng et al., 2012;Jiang et al., 2017).
The inference rules of TSK fuzzy system are usually defined as: Then f k (x) = p k 0 + p k 1 x 1 + · · · + p k d x d , k = 1, · · · , K K is the number of fuzzy rules. Each rule is premised on the input vector x = [x 1 , x 2 , , x d ] T and maps the fuzzy set in the input space A k ⊂ R d to the change single case represented by f k (x). A k i is the fuzzy subset of the ith dimension of the input vector x under the kth rule. ∧ is a fuzzy conjunction operator. According to previous work (Jiang et al., 2017) the result of the TSK fuzzy model can be expressed as we use the Gaussian membership function, i.e.,

Different distribution
Training data Test data FIGURE 2 | Actual data distribution scenario.  as the fuzzy membership function. In the paper, we use the FCM algorithm to obtain c k i and δ k i . They can be estimated by the following expressions u jk is the fuzzy membership corresponding to the jth sample in the kth cluster. And h is the artificially adjustable scale parameter. After determining these antecedent parameters, let According to the above transformation, Eq. 2a be converted to the following linear regression problem (Jiang et al., 2017).
A well-performing algorithm is proposed (Jiang et al., 2017) to train the classic TSK-FS. The objective function of this algorithm is min where 1 2 P T g P g is the regularization term, which can effectively promote the generalization ability of the TSK-FS; P g is a consequent parameter; X is the matrix obtained by (5c); 1 is a regularization parameter. It can adjust the balance between model complexity and error tolerance; [Y = y 0 , y 1 , ..., y n ] is the label matrix.
In order to obtain the optimal P g , the derivative of J TSK−FS (P g ) with respect to P g can be set to 0, and then the optimal solution of P g can be obtained as follows: Through the optimal prior and posterior parameters, we can establish a classic TSK fuzzy system.

Preparatory Knowledge of Transfer Learning
There are three methods for transfer learning: data distribution adaptation, feature selection, and subspace learning (Shi et al., 2013;Zheng, 2021). The basic concept of data distribution adaptation is to make the probability distribution of the data of the source domain (D s ) and the target domain (D t ) the same or similar through some transformations. Feature selection method considers that the source domain and the target domain contain some common features, and their data distribution is similar. Then, the common features are extracted through the machine learning method, and the model can be built based on these features. The subspace learning method usually assumes that the data of the source domain and target domain will have similar distributions in the transformed subspace, and then learn through statistical feature transformation or manifold transformation. The joint distribution adaptation  adopted in the paper belongs to the data distribution adaptive methods. Specifically, the core of joint distribution adaptation is to simultaneously minimize the marginal probability distribution and the conditional probability distribution of the two domains. The distance in machine learning has various forms. Here we use the MMD (Long et al., 2012) as a distance measurement. It can be calculated as follows: where ∅ is a feature mapping, n is the number of samples in the source domain and m indicates the number of samples in the target domain.

IDENTIFICATION OF EPILEPTIC EEG SIGNALS THROUGH TSK TRANSFER LEARNING FUZZYSYSTEM
In this section, we will introduce in detail the transfer learning techniques we use. Combined with the analysis and research on the rules and parameter learning strategies of the classical TSK fuzzy system, a TSK-TL method for detecting epileptic signals is proposed.

Framework Based on Transfer Learning
The transfer learning strategy used in this study is divided into two parts: joint distribution adaptation (Zheng, 2021) and historical knowledge learning mechanism. As shown in Figure 5, the framework of epilepsy EEG signals recognition based on transfer learning theory is given. In order to make full use of the information of source domain and target domain, our work mainly includes three steps: (1) minimizing the marginal probability distribution, (2) minimizing the conditional probability distribution, and (3) further learning with historical knowledge.
(1) Minimize the marginal probability distribution The marginal distributions of the source domain and target domain are represented by P s and P t , respectively. The paper builds a new model based on TSK-FS, P g can be taken as a projected vector, x g is a projected vector. And in order to make MMD a proper regularization for the classifier, we adopt the projected MMD (Long et al., 2013). Then the marginal probability distribution can be obtained by where x si is the ith sample of the source domain, x tj is the jth sample of the target domain, n and m indicate the number of samples in the source domain and target domain, respectively.
(2) Minimize the conditional probability distribution Reducing the distance of conditional distribution is actually to achieve intra-class migration, but we don't know the label of the target domain. In the paper, we use some traditional classification algorithms (such as SVM) to obtain the pseudo-label of the target domain. At the same time, we assume that the calculated pseudoclass centroid may be located not far from the real class centroid (Long et al., 2012). Therefore, we can calculate the distance of conditional probability distribution by using both true label and pseudo label. The conditional probability distribution of source domain and target domain are represented by Q s and Q t , and then the conditional distribution can be calculated as follows: where c ∈ {0, 1, ...,C} is the category tag. D (c) s is a set of the data belonging to class c in the source domain andn (c) = t is a set of the data belonging to class c in the target domain and m (c) = D (c) t . By integrating Equations 9 and 10, the joint probability distribution distance can be calculated as follows: (3) Combine the historical knowledge to further learn The parameter P g 0 obtained through classic TSK-FS is used to further guide the learning, and then the complete tranfer learning item is where D P g 0 , P g = ||P g 0 −P g || 2 (11)

The Objective Function of TSK-TL
We design the objective function of TSK-TL as where P g is the expected projection of TSK-TL, P g 0 is the consequent parameter of the classical TSK model. X s is a data matrix from the source domain. Y = [y 0 , y 1 , ..., y n ] is the label matrix, if the ith sample belongs to healthy volunteers, then y i is 1, otherwise y i is -1. And 1 , 2 , 3 are the regularization parameters. Then we further explain the above formula as follows: (1) Equation 12b is the training model of the classic TSK fuzzy system, so the TSK-TL we proposed inherits all its advantages.
(2) When experimenting with the classical TSK fuzzy system, the result is poor because of the distribution difference of data. By Equation 12c, the joint probability distribution distance is minimized to optimize the experimental results.
(3) In Equation 12d, P g is further optimized by measuring the distance between P g 0 and P g . If 3 is infinite and the term is optimal, then P g 0 and P g are equal. (4) The regularization parameters 1 0, 2 0, and 3 0 are used to control the balance between different terms. We use the grid search method to determine their values.

Solution of TSK-TL
In Equation 12c, the first term can be converted as follows: Similarly, the second term can be converted as: where X is a matrix composed of the source domain and target domain data. M c is MMD matrix computed as: So Then, min P g J TSK−TL P g = 1 2 P T g P g + λ 1 2 ||P T g X s −Y|| 2 +λ 2 tr P T g XMX T P g + λ 3 ||P g 0 −P g || 2 By setting the derivative of J TSK−TL with respect to P g to zero, i.e., ∂J TSK−TL P g = 0, we can get the optimal solution of P g as follows: By (7), (15), and (16), we can obtain the optimal posterior parameter P g . Then based on P g , the final decision function can be obtained as follows The details of the proposed TSK-TL algorithm are as follows: Algorithm of TSK-TL.

Stage 1: Construction of datasets for linear regression
Step 1: Through classical FCM or other partition techniques to divide the input space of training data to determine the premise of TSK-FS.
Stage 2: Computation of historical knowledge parameter P g 0 Step 3: Obtain the P g 0 by (5).
Stage 3: Obtain the parameter P g of TSK-TL Step 4: Compute the MMD matrix by (13).
Step 5: Compute the consequent parameter P g of TSK-TL through (14).

EXPERIMENTAL PROCESS AND RESULTS ANALYSIS
In this section, the proposed TSK-TL method is evaluated by classifying EEG signals of epilepsy patients and healthy people. In addition, a comparative study of five traditional machine learning algorithms and two transfer learning algorithms is carried out. Details of the experiments are as follows.

Experimental Setup
In this paper, we use three classical feature extraction methods, namely WPD, STFT, and KPCA, to obtain EEG datasets from three different views and perform experiments on them, respectively. In each experiment, we set up 10 experimental datasets, and every dataset is composed of part of the data from two or three different groups in this view. The details of these datasets are shown in Table 2. The structure of this dataset ensures that there is no or only a part of the data comimg from the same group. In short, there is no or only a part of the data in the source domain and the test domain has the same distribution.  2. For the proposed TSK-TL, the number of fuzzy rules is taken from the set {5, 10, 15, 20, 25, 30}, and the regularization parameters λ 1 , λ 2 and λ 3 from the set 10 −5 , 10 −4 , · · · , 10 4 , 10 5 Note: Each parameter is determined by the grid search During the experiment, the experiment of each experimental dataset is repeated 10 times. And take the average result of 10 times to evaluate the performance. The optimal hyperparameters of each experimental model are determined by the grid search. All the algorithms are implemented using MATLAB on a computer with Intel i5-4590 3.3 GHz CPU, 12 GB of RAM. The details of the experimental setup are shown in Table 3.

Recognition Performance
In our experiment, five traditional machine learning algorithms and two transfer learning algorithms, i.e., LMPROJ and STL are used for comparative experiments. The results are shown in Tables 4-6. By observing these results, three conclusions can be drawn as follows.
(1) The proposed TSK-TL method not only reduces the difference of data distribution, but also optimizes the formation of new knowledge parameters through the prior knowledge parameters obtained from the classical TSK-FS model. The results show that our method can improve the accuracy of EEG signal recognition.    (2) Compared with the two transfer learning classifiers LMPROJ and STL, TSK-TL is superior to LMPROJ in recognition of epileptic EEG signals, and its performance is better than or at least comparable to the STL method. STL method uses affinity within the class to transfer knowledge within the class, and then to learn. The LMPROJ method only reduces the marginal distribution of data based on the SVM algorithm.
(3) The TSK-TL method we proposed has a good recognition effect in the three views, and the average recognition accuracy in the WPD view is the highest, which points out the direction for us to select an appropriate feature extraction method in practical applications. At the same time, it can be observed that when processing the D5 and D6 data sets, the recognition accuracy is greatly improved compared with other methods, which further verifies that our proposed method has a better effect in the face of data with large differences in data distribution.

Statistical Analysis
We evaluated the experimental results from a statistical point of view through the Friedman test (Demšar, 2006;Garcia and Herrera, 2008) and the Holm's post hoc test (Demšar, 2006;Garcia and Herrera, 2008). Friedman test is used to calculate the average ranking of the compared methods and to determine whether the observed difference are statistically significant. We set the significance level of the test to 0.05. If the p-value is less than 0.05, the null hypothesis H0 is rejected and we can confirm that there are significant difference. The Holm's post hoc test is  Table 7, and the results of the Holm's post hoc test are shown in Table 8. Table 7, the Friedman test results reveal that the TSK-TL method performs better than the other seven classification methods in classification accuracy. And the results

As shown in
No. of rules Antecedent parameters Consequent parameters of Holm's post hoc in Table 8 also show that compared with other methods, TSK-TL-TL has better performance. This once again proves that our proposed TSK-TL has achieved better results in the detection of epileptic EEG signals.

Model Analysis
In this section, TSK-TL is analyzed through the model trained from the D2 dataset in the KPCA view. Table 9 gives an example of the model, these model parameters prove that TSK-TL inherits the interpretability of the classical TSK fuzzy system. In this example, we set up five fuzzy rules. Figure 6 shows the corresponding MF of each fuzzy set, where each MF has a fuzzy rule description, such as "the energy of a band of EEG signal is Low" (A little low, Medium, A little high or High). Because medical experts in different fields have different interpretations of fuzzy rules, the language description given is only a possible example. According to the central value from low to high, these five MFs can be expressed as "Low, " "A little low, " "Medium, " "A little high, " and "High". Finally, through the language expression of the IFpart of the fuzzy rule and the linear function corresponding to the THEN-part of the fuzzy rule, five fuzzy rules of the KPCA view are given: The first fuzzy rule: If the energy of EEG signal from band 1 to band 6 is Low, A little high, Medium, A little high, Medium, and Medium, respectively, THEN the decision value under this rule is obtained by the following formula: The second fuzzy rule: If the energy of EEG signal from band 1 to band 6 is A little high, Medium, A little Low, High, A little Low and A little Low, respectively, THEN the decision value under this rule is obtained by the following formula: f 2 (x) = [10.9842−8.5028x 1 −0.9881x 2 + 4.7443x 3 −1.8692x 4 + 3.2596x 5 −3.0533x 6 ] The third fuzzy rule: If the energy of EEG signal from band 1 to band 6 is Medium, High, A little high, Low, Low and A little high, respectively, THEN the decision value under this rule is obtained by the following formula: The fourth fuzzy rule: If the energy of EEG signal from band 1 to band 6 is High, A little low, High, Medium, High and Low, respectively, THEN the decision value under this rule is obtained by the following formula: f 4 (x) = [−3.8877−2.2364x 1 −0.7276x 2 −1.1165x 3 + 0.1062x 4 −1.3669x 5 0.9594x 6 ] The fifth fuzzy rule: If the energy of EEG signal from band 1 to band 6 is A little Low, Low, Low, A little low, A little high, and High, respectively, THEN the decision value under this rule is obtained by the following formula: f 5 (x) = [2.0489 + 4.0366x 1 −4.2784x 2 −1.4602x 3 −1.0065x 4 + 0.0086x 5 + 0.9774x 6 ] According to the final output value, i.e., y = -1 or y = 1, it can be judged whether the patient has epilepsy.

CONCLUSION
The study combines the classic TSK fuzzy system with transfer learning technology and proposes a TSK fuzzy system (TSK-TL) that is interpretable and can better adapt to scenarios of data distribution differences. It expands the application scenarios of the model and realizes the recognition of epileptic EEG signals with large data distribution differences in reality.
Although we have proved the effectiveness of TSK-TL, it can be further optimized. For example, there are several predefined parameters in the TSK-TL algorithm. When optimizing them, the optimization process of these parameters takes a lot of time.
In the future, we will study the problem and develop more effective algorithms.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: https://github.com/benfulcher/hctsaTutorial_ BonnEEG.

AUTHOR CONTRIBUTIONS
ZZ developed the theoretical framework and model in this work and drafted the manuscript. XD gave support for medical knowledge. ZZ, XD, JY, and AC implemented the algorithm and performed experiments and result analysis. All authors contributed to the article and approved the submitted version.