Deep extreme learning machine with knowledge augmentation for EEG seizure signal recognition

Introduction Intelligent recognition of electroencephalogram (EEG) signals can remarkably improve the accuracy of epileptic seizure prediction, which is essential for epileptic diagnosis. Extreme learning machine (ELM) has been applied to EEG signals recognition, however, the artifacts and noises in EEG signals have a serious effect on recognition efficiency. Deep learning is capable of noise resistance, contributing to removing the noise in raw EEG signals. But traditional deep networks suffer from time-consuming training and slow convergence. Methods Therefore, a novel deep learning based ELM (denoted as DELM) motivated by stacking generalization principle is proposed in this paper. Deep extreme learning machine (DELM) is a hierarchical network composed of several independent ELM modules. Augmented EEG knowledge is taken as complementary component, which will then be mapped into next module. This learning process is so simple and fast, meanwhile, it can excavate the implicit knowledge in raw data to a greater extent. Additionally, the proposed method is operated in a single-direction manner, so there is no need to perform parameters fine-tuning, which saves the expense of time. Results Extensive experiments are conducted on the public Bonn EEG dataset. The experimental results demonstrate that compared with the commonly-used seizure prediction methods, the proposed DELM wins the best average accuracies in 13 out of the 22 data and the best average F-measure scores in 10 out of the 22 data. And the running time of DELM is more than two times quickly than deep learning methods. Discussion Therefore, DELM is superior to traditional and some state-of-the-art machine learning methods. The proposed architecture demonstrates its feasibility and superiority in epileptic EEG signal recognition. The proposed less computationally intensive deep classifier enables faster seizure onset detection, which is showing great potential on the application of real-time EEG signal classification.


. Introduction
Epilepsy is a common chronic neurological disease caused by sudden abnormal discharge of neurons in human brain (Sanei and Chambers, 2013).Most epileptic patients have no difference from common people when epileptic seizure does not appear, but epilepsy has a serious effect on quality of human life, or even causes fatal harm (Iasemidis et al., 2003).Rapid and accurate diagnosis of epilepsy is essential for the treatment of patients and the risk reduction of potential seizures, and its relevant technique is urgently expected in current society.Electroencephalogram (EEG) shows the electrical activity of human brain recorded by amplifying voltage differences between electrodes placed on the scalp or cerebral cortex.In traditional epilepsy detection by doctors, visual marking of long EEG recordings is a tedious and high-cost task with high misjudgment rate, especially taking into account the subjectiveness of experts (Wang et al., 2018).
EEG signal recognition plays an important role in the assessment and auxiliary diagnosis of epilepsy (Ghosh-Dastidar et al., 2007;Ahmadlou and Adeli, 2011;Ayman et al., 2023).Careful analysis of the electroencephalograph records can provide valuable insight and improved understanding of the mechanisms causing epileptic disorders.Machine learning methods, such as neural network (Subasi and Ercelebi, 2005;Kumar et al., 2010), fuzzy system (Güler and Übeyli, 2005), support vector machine (Panda et al., 2010;Nicolaou and Georgiou, 2012;Kumar et al., 2014), and extreme learning machine (Liang et al., 2006b;Yuan et al., 2011;Song and Zhang, 2013), have been extensively used in EEG signal recognition.But some of the existing intelligent methods perform poor in terms of classification accuracy, real-time prediction and so on.As a novel paradigm of learning method, ELM can not only learn rapidly with good generalization performance, but also effectively overcome the inherent drawbacks of some intelligent technologies.In recent years, ELM and its variants (Huang et al., 2004(Huang et al., , 2006(Huang et al., , 2011a,b;,b;Liang et al., 2006a;Betthauser et al., 2017) have received increasing attention.However, its shallow structure is deficient in extracting the significant implicit information from the original data, which becomes the main bottleneck restricting its development.As a popular trend in machine learning, deep learning has confirmed that pattern recognition can remarkably benefit from the knowledge learned via hierarchical feature representation.Typical deep networks include deep belief network (Hinton and Salakhutdinov, 2006;Hinton et al., 2006;Plis et al., 2014), convolutional neural network (Khan et al., 2017;Acharya et al., 2018;Choi et al., 2019), stack autoencoder (Bengio et al., 2007;Vincent et al., 2010;Xu et al., 2015), etc.There are many artifacts and noises in EEG signals, which can seriously decrease recognition efficiency (Bengio, 2009;Zhou and Chan, 2016;Bhattacharyya and Pachori, 2017).Deep learning is exactly able to resist noise in recognition process and can remove noise from EEG data (Huang et al., 2013;Deng et al., 2016).However, conventional deep learning algorithm is time-consuming with complicated structure and can easily lead to overfitting in presence of limited available samples.In order to tackle the aforementioned problems, ELM is gradually combined with deep learning to generate a high-performance model (Tang et al., 2014(Tang et al., , 2015;;Yu et al., 2015;Zhu et al., 2015;Duan et al., 2016;McIntosh et al., 2020).However, most of the existing hierarchical ELM models can hardly effectively use the knowledge learned in previous layers.
ELM is popular for its high-speed response, realtime prediction ability, network conciseness, and excellent generalization performance.The thought of deep learning can be beneficial to excavate the invisible value of input to the greatest extent.To address the problem of lacking representational learning, deep extreme learning machine (DELM) is proposed to recognize EEG epileptic signals.The efficient deep classifier is based on stacked structure, which in essence consists of several modules whose hidden layer parameters are initialized randomly.The proposed method forms a hierarchical structure to aggregate some discrete and valuable information stepwisely into knowledge for hierarchical representation.The previous valuable information is fed into new input in the manner of available knowledge and then transmitted to current sub-model, which serves to implement the subsequent recognition task better.According to stacking generalization theory, the output of the next sub-model plus the knowledge of the previous sub-model in DELM can indeed open the manifold structure of the input space, which resulting an improved performance.DELM have accomplish fast epileptic recognition and show greater performance in EEG signal classification than traditional ELM and some of the state-of-the-art methods, which makes it possible to finish accurate epilepsy diagnosis in real time and with high precision.The main contributions of this work are as follows: (1) DELM is a novel deep learning structure, which is the product of the fusion of ELM and deep learning.DELM is composed of original ELMs, accordingly, the new structure is inherently brief, flexible to implement, and demonstrates a superior learning performance.Additionally, the introduction of deep representation ensures that valuable knowledge is refined and not wasted.Learning rich representations efficiently is crucial for achieving better generalization performance and informative features can promote the accuracy.In our paper, the new framework can achieve classification accuracy comparable to that of existing deep network schemes in EEG recognition tasks, while DELM takes the leading position in training speed.(2) Motivated by deep learning, the proposed DELM is used to capture useful information in multi-dimensional EEG variables.DELM is a hierarchical framework, which incorporates a stepwise knowledge augmentation strategy into original ELM.It learns knowledge in an incremental way and expands it in the manner of forward calculation.The current sub-model can exploit knowledge from all previous sub-models and the recognition results can be obtained in the last layer.(3) DELM uses classic ELM as the basic building block, and each module is the same as the original ELM structure.Supervised learning performs throughout the whole learning process and each sample has a tendency to approach to its own class under the supervision.
The main differences of the proposed DELM and traditional and deep learning methods are summarized in Table 1.The rest of this paper is organized as follows.Section 2 presents the details of deep extreme learning machine proposed in our work and describes its learning process.Section 3 introduces the experiment conducted and compares the recognition performance of the proposed method with that of existing conventional methods on real EEG datasets.Finally, Section 4 concludes the findings of the study.ELM with L hidden neural units and activation function g(.) can approximate these N samples with zero error, which is modeled as (Huang et al., 2004): where , β i is the weight vector connecting the ith hidden node and the output nodes, w i is the ith hidden node and the input nodes, and b i represents the bias of the ith hidden node.For the sake of convenience, the equation can be written in a compact form where H is the hidden layer output matrix of neural network, the ith column of H is corresponding output of the ith hidden layer unit with respect to inputs.
The solution of Equation 2 is equivalent to the next optimization problem (Liang et al., 2006a): In most cases of practical application, the hidden layer neurons is far less than the samples need be trained, L ≪ N. The output matrix of the hidden layer is not a square matrix, and the minimum norm least-squares solution of the above linear system can be calculated by Equation 4 (Huang et al., 2011a): H + denotes the Moore-Penrose generalized inverse of the output matrix H.The theory of ELM is aimed at reaching not only the smallest training error but also the smallest norm of output weights.ELM is a shallow network composed of three layers (respectively input layer, hidden layer and output layer), whose representation capability is limited.Adequate representation of the input is routinely desired to acquire an excellent performance in the idea of deep learning.On account of the flexibility and efficiency of ELMs, ELM is extended to the learning of deep neural network (DNN) to shorten the learning time dramatically and reduce the computational complexity without deserting their original excellence.The proposed architecture constructed from ELM building block is a new ELM-based stacked structure that processes information layer by layer in order to utilize the learned knowledge.Figure 1 depicts the architecture of the proposed hierarchical method.
The proposed structure inherits the simplicity of the original ELM, and then digestion and absorption of knowledge is performed in multiple sub-model.In DELM, the initial EEG epileptic signal is learned step by step in a forward manner.The representation learned from the previous layer is regarded as new knowledge and will then be taught.Upon the arrival of given input, the

FIGURE
The proposed hierarchical architecture.

FIGURE
Stepwise knowledge learning in DELM.
corresponding linear system can be solved immediately in the first ELM.
In a singleton ELM module, the knowledge generation process is as follows.If H T H is nonsingular, the orthogonal projection method can be used to calculate the generalized inverse of a matrix (Huang et al., 2011b): According to Equation 4 (Betthauser et al., 2017), we can get For binary EEG classification applications, the decision function is: g(.) maps the data from input space into the L-dimensional hiddenlayer feature space (ELM feature space).By inserting Equations 6 into Equation 7, we can obtain For multi-class EEG classification tasks, the corresponding predicted label of sample is the index number of the output node which has the highest output value for the given instance.f p denotes the output function of pth node, then we have the predicted class label of sample x: Each sub-model in a higher layer takes information transformed from the decision output of the previous lower layers and appends them as supplementary knowledge, enabling more relevant representation to be handed over to the next generation.Deeper representation is captured to build a hierarchical network until the next additive ELM had no remarkable effect.With deep representation in DELM, useful information is wellexplored and transmitted from the initial layer to the last layer, bringing a more complete and precise expression of original input, improving the knowledge utilization rate greatly and strengthening the learning capability of ELM.Several ELMs are combined together by means of a serial link and the response can be reused in higher sub-model next to it.On the premise of meaningfulness of extended ELM, the purpose of the previous submodel is to convey the knowledge learned by previous layer.By updating the knowledge community, the original manifold can be separated apart in the end.

. . Knowledge augmentation based on DELM
A detailed introduction to knowledge transfer between multiple modules is provided in Figure 2. The input of n dimensional attributes provides data for the first level to construct a traditional ELM classifier.For N samples in a given dataset, x i is the data of the ith dimension attribute corresponding to different samples, and t i is the expected label, where The expected label is expressed in T while the actual output Y d calculated by the dth level model is expressed as: m represents the number of categories of samples.The matrix form is as follows: After finishing the task of the first ELM, the output produced by sub-model1 is Y 1 ∈ R m×N .Resemble the process in classic ELM, the output matrix should perform a transformation here.The information acquired by current sub-model is integrated, and the fused knowledge community is stored for the next knowledge transmission.For the ith instance, take the maximum value in its each column as its class label, store the class label x n+1 ∈ R 1×N and merge it with the original input.The updated input is obtained in the second level Submodel2: where the original input matrix is expressed as X 1 , the activation function is g(.), total number of iterations is r.

Output:
The output label Y.
Step 1: (a) Randomly initialize input weights w i and biases of hidden layer neurons b i ; (b) Calculate the output matrix of the hidden layer H 1 ; (c) Determine the output weights analytically according to Equation 6, convert the actual output to label matrix A 1 , and store it into a new representation matrix , so the updated dataset of input: Step 2: Initialize the depth d = 2.

Repeat
until the testing error threshold between the two adjacent submodels is satisfied Algorithm .DELM  Algorithm Parameter description and the significant information is stored in new input: Then, the third submodel leverages knowledge extracted from the output of sub-model_1 and sub-model_2 to complete the classification of the model.Establish three modules or more on both training and testing sets and that can yield favorable results.The input for these modules comprises original features and appended features from all previous recognition prediction.So the augmented input for each module can be formed as: At each level, the predicted output of current sub-model is integrated into the input as learning experiences.In the next learning step, the new input after incorporation will be mapped into a new ELM feature space through random mapping in current sub-model to solve the least square problem.The new features, including A 1 , A 2 and so on, contains discriminative information derived from lower modules, so it is helpful in forcing the manifold structure apart in original EEG input.In this course of knowledge augmentation, DELM is aimed at learning a more reasonable decision basis from raw data in classification tasks. .

. Specialty of DELM pattern classifier
We are motivated by the idea of deep learning and stacking generalization theory, and establish a hierarchical ELM-based stacked architecture.Each sub-model has the same supervised learning process as classic ELM and several ELMs are integrated into a deep network.ELM in each level is an elegant original model, which is respectively composed of input layer, hidden layer and output layer in our paper.Under the guidance of the corresponding expected labels, DELM can better pull each sample to its own class cluster, hence, samples have a tendency to approach their own field gradually after knowledge augmentation.In other words, it makes it easier for the samples belonging to some class to be identified as belonging to its true class by DELM pattern classifier.Accordingly, the output generated in previous submodel performs knowledge transformation first, and then it is regarded as a supplement to the input.DELM is targeted at achieving a richer form of representation from raw data, which enables the sequential propagation of knowledge in a forward way and provides a method to automatically discover valuable implied patterns.With the valuable information extracted from the instances, the whole model is directed to study the internal information of instances, and constantly approach the ideal output with stepwise learning.
Noise caused by electrode movement or others often appears in the practical EEG signal, resulting in poor recognition results.The proposed framework has the anti-noise capability of deep network in practice in contrast to the traditional ELM algorithm, which can stand against the noise to a certain extent.With stepwise transformation of input EEG epileptic information, the dimension of the input expands continuously, and the pollution in the original data is gradually reduced or eliminated.Stepwise knowledge is continuously strengthened, more reasonable features are generated, and the final classification accuracy of epileptic EEG signals is improved.
The entire network consists of several stacked independent ELM modules.The stacked approach is one of the most effective ensemble learning strategies.Our model trains several submodels in a serial way, and each submodel still preserve the output of the previous submodel for deep representation learning, which shares the same philosophy as stacked generalization (Wolpert, 1992;Wang et al., 2017;Hang et al., 2020).
Our model is aimed at reducing the loss of effective information in the original data and greatly economizing the time required for classification under the premise of ensuring certain accuracy.The information is extracted, grows in refinement and richness, and is accepted to be vital members of the knowledge community ultimately.The sub-model that organizes the higher layer has additional input features involving the classification output from all previous sub-models.DELM learns reasonable and effective features from a large number of complex raw data, and the newly generated features are absorbed by our deep network into its own knowledge, which can achieve satisfactory results in most cases when faced with practical application problems.
In the previous phase, multi submodels are adopted for knowledge augmentation and knowledge are automatically captured through feature expansion.In the latter phase, the original input and the generated knowledge in previous modules are used to accomplish the modeling and the classification tasks.The deep learning algorithm of the proposed DELM is summarized in Algorithm 1.

. . Time complexity analysis
In order to exhibit the time complexity of the proposed deep learning algorithm, we start with the classic ELM algorithm first.The time complexity of classic ELM algorithm mainly lies in the solution of Moore-Penrose generalized inverse of hidden output matrix.In terms of Equation 5, O N 2 L can be required to compute the H T H.It requires O N 3 to calculate the inverse.So the time complexity in ELM becomes O N 3 + N 2 L + NnL + 1 .The proposed DELM introduces the concept of deep learning, which is composed of several building units.Obviously, the time complexity of the entire DELM can be indicated as  depth, L is the number of hidden layer neural network units and N is the number of instances. .

Experiment studies
In this section, we will demonstrate the effectiveness of our proposed hierarchical model DELM by reporting the experiment result from Bonn dataset.In our experimental study, DELM is sequentially compared with some machine learning algorithms and popular deep learning networks such as DBN, and so on.The final performance evaluation is performed according to the result.In our experiment, all adopted methods were implemented using MATLAB 2019a on a personal computer with Intel Core i5-9400 2.90 GHz CPU and 8.0G RAM.

. . Epileptic EEG dataset
The EEG signals used in the paper are derived from Department of Epileptology, Bonn University, Germany.The dataset has been described in detail by Andrzejak et al. (2001).The EEG signals were collected under various conditions with five healthy volunteers and five epileptic patients.The details information of five groups are summarized in Table 2, in which each group contains 2,300 samples.
The dataset consists of five groups of data (A, B, C, D, and E) where each containing 100 single-channel EEG segments.EEG data were recorded using the same 128-channel amplifier system with a sampling rate of 173.6 Hz and a 12-bit resolution.Each EEG segment contained 4,096 sampling points and lasted 23.6 s.The five samples in Figure 3

. . Data preparation and normalization
Firstly, the EEG signals are segmented into 178 sampling points by means of moving windows, among which there is no overlapping of sampling windows.Therefore, 23 epochs can be obtained from each segment.The remaining points in each segment are dismissed.Different features extracted from the original EEG signals have different scales after data segmentation, so it is necessary to use normalization processing to normalize all attribute features.

. . Experiment setup
In our experimental organization, the processed dataset is firstly randomly divided into two parts: training and testing set.In each scenario, we randomly selected 80% of the data as the training data, and the remaining 20% as the testing data.The experiment is repeated 20 times in various scenarios and then the average experimental results of some other schemes are also collected as contrast.In our experiment, SVM, RBF and some ensemble algorithms such as Adaboost are used.Meanwhile, experimental results of well-known deep networks, such as DBN and SAE, are also adopted as comparison in our experiment in order to demonstrate the superiority of the proposed DELM.
To reasonably evaluate our method, the performance metrics adopted here are Accuracy and F − measure, which are defined as follows: where TP (true positive) represents the number of segments detected as seizure correctly, FN (false negative) represents the number of segments detected as non-seizure incorrectly, TN (true negative) represents the number of segments detected as nonseizure correctly, and FP (false positive) is the number of segments detected as seizure incorrectly.
In terms of recognition accuracy, our DELM model can achieve great classification accuracy comparable to that of deep learning schemes.Running time is one of the key evaluation indexes which can perform excellent performance in DELM.The classic ELM is qualified for real-time recognition  The best results are marked in bold.
Frontiers in Neuroinformatics frontiersin.orgrequirements and so does our hierarchical model.Extremely fast recognition ability can still be exhibited in DELM, meanwhile traditional deep networks are too far behind to catch with.Among all the competing schemes, SVM, RBF, and ensemble algorithms were implemented by toolbox in MATLAB.And traditional deep learning algorithms are implemented by MATLAB which is encapsulated in the DeepLearning Toolbox.The parameters settings are summarized in Table 3.
In each sub-model, all input weights and hidden biases are set to the pseudo random values drawn from the uniform distribution on the interval (−1, 1) and (0, 1).Such scheme is in accordance with the standard methodology of ELM, which simplifies the learning process.In each ELM, the hidden layer adopts the same number of hidden nodes and the same activation function.Sigmoid function is chosen as the activation function g(.) in each submodel.The number of hidden units is usually scenario-specific and determined by experience or by continuous attempts.We need to find a point as balanced as possible between the number of hidden units and time.As a result, DELM can acquire a relatively mature knowledge system, which can well meet the accuracy requirements of classification.The optimal amount of hidden units in all submodels is uniformly set to a fixed value 500.Considering the difficulty of recognition in five class problem, the number of hidden units is set to 800.More ELMs can be cascaded to modules, if desired, for the purpose of adequate knowledge.So we dynamically determine the depth of network.The stacking process will be aborted if the difference between the current and upper level in the experiment is <0.1.It is clear that DELM simply involves a few parameters, which greatly reduces the cost of parameter adjustment.To evaluate DELM comprehensively and precisely, classification tasks in various scenarios are designed here.

. . Epileptic EEG signal recognition . . . Two class problem
Classification of four combinations between A and E, B and E, C and E, and D and E are considered to distinguish normal from seizure.Epileptic seizure segments E was selected to compare with one of the remaining EEG sets from the dataset for classification.Then select two or more sets in the database and conduct trials again.The combinations are as follows: AB and E, AC and E, AD and E, BC and E, BD and E, CD and E, ABC and E, ABD and E, ACD and E, BCD and E, and ABCD and E.

. . . Three class problem
In three class problems, the selected combinations are: A, B, E and A, C, E and A, D, E and B, C, E and B, D, E and C, D, E.

. . . Five class problem
In five class problem, each group is regarded as an independent class for testing.

. . Experimental results and statistical analysis
Table 4 shows us the accuracy in the sense of both the mean and standard deviation in DELM and deep networks.The results are also presented when the depth d of DELM is 3.But the result in the case is still a certain gap from the ideal, and more ELMs are required to assure higher accuracy.In terms of accuracy, DELM can compete with conventional intelligent methods.It can be noted in the results that the proposed method has certain advantages over traditional methods and is generally comparable to traditional deep networks.We attribute this advancement in recognition performance to the embedded knowledge.The accuracy is greatly improved by extending the vertical network layers and the model gradually acquires a better command of the implication of knowledge.Table 4 also report the accuracy of common machine learning algorithms on our datasets.
Since DELM can inherit advantages of ELM, extremely fast learning speed is one of its remarkable characteristics.In the aspect of computational efficiency, the slight increase of learning time (extremely short seconds) in DELM compared to the original ELM is inappreciable, especially when considering the added improvement in classification accuracy.DELM is about sacrificing a little time and tolerating a cascade of multi modules in exchange for final performance, so we just need to draw comparison between our ELM-based deep network and traditional deep network.The experimental results show that the time needed in DELM is much less than that of the traditional deep networks after the accuracy is guaranteed to meet the requirements.In some designed scenarios, the speed of DELM in training and testing is approximately a dozen times faster than traditional deep networks.
Figure 4 reports time efficiency during learning process, and the result is average learning time of models.As observed from both Table 4 and Figure 4, the accuracy performance is almost similar in DELM and traditional deep methods.However, the time consumed by the proposed classifier is the least.Taking into account both accuracy and computational effort simultaneously, the proposed DELM demonstrates tremendous potential in EEG classification and may be a competitive choice.
Figure 5 shows the changes in recognition accuracy along with current stacked depth of modules in different EEG classification scenarios.There is no doubt that the EEG classification accuracy increases with the addition of sub-models.The number of submodels we use is namely the depth of DELM.Depth is denoted by d, and the result shows the classification accuracy from d = 1 to d = 10.It is shown that the improvement in accuracy can be relatively evident in the first three levels.Modest improvement can still be obtained in the subsequent expansion of ELMs, but DELM will gradually lose competitiveness in real-time tasks.Without rapid classification performance, what we think of as our inherent excellence in our model, several serial ELM network modules in our model, would make no sense and our previous efforts would not worth it.By starting from d = 1, ordinary extreme learning machine, excellent features can be well-preserved and classification effect is gradually improved.Performance augmentation can be seen in these figures.
The depth is the key aspect to knowledge augmentation in DELM.In our experimental organization, different depths are adopted in binary class problems, while D = 6 is uniformly adopted in three and five class problems in order to obtain better classification accuracy.Setting a threshold for DELM is because excessive accumulation of layers is not productive any more.The average depth of binary class problems is d AVG = 5.8.The selection of depth parameters are shown in Figure 6.
Table 5 presents F − measure scores obtained by traditional deep learning methods in different scenarios.From the perspective of F − measure scores, DELM outperforms several deep networks used for comparison.In other scenarios, DELM is slightly worse than deep networks, but it still performs well and is comparable to deep networks.
DELM enjoys extremely fast speed of ELM while providing deeper representation of original signals.Experiments show that our algorithm consistently outperforms several existing state-of-art schemes in terms of accuracy and execution time.

. Conclusion
A novel deep extreme learning machine DELM is proposed for the recognition of EEG epileptic signals in our paper.DELM stepwisely transmits the response to the next submodel through fusion of knowledge derived from previous sub-models.Such a process is beneficial to mine the valuable information of the original EEG data, so as to better accomplish the subsequent EEG recognition tasks.The proposed model operates in a forward way with an increment form to strive for an increasingly efficient performance and its computation speed is considerably fast.ELM is introduced as the basic building block, making the whole learning process flexible and effective.As available knowledge, the classification results of the previous multi-module can enhance the classification performance of the subsequent modules.Our experimental results demonstrate that the proposed method is a promising candidate for epileptic EEG-based recognition.Compared with traditional methods, the proposed DELM is motivated by deep learning and stack generalization theory, which can obtain excellent classification results and outperform the traditional methods.According to stacking generalization theory, the output of the next sub-model plus the knowledge of the previous sub-model in DELM can indeed open the manifold structure of the input space, which resulting an improved performance.Moreover, knowledge augmentation can effectively extract the implied knowledge in each sub-model and obtain increasing performance.However, it is still not clear the reason for the improvement of knowledge augmentation throughout the training process.In the future work, we will spare no efforts to theoretically demonstrate how the prediction output in each ELM module can be helpful with EEG epileptic signal recognition.
(a) Randomly initialize input weights and biases of hidden layer neurons; (b) Calculate the new output matrix of the hidden layer H d , d refers to the d th submodel of the current training process; (c) The output weight of corresponding submodel is calculated: β d =H + d T; (d) Compute the classification output: Y d =H d β d , the matrix after label transformation of output A d , and store it into a new representation matrix X d+1 = [X d |A d ], so the updated dataset of input:

FIGURE
FIGURESamples from five EEG sets (A-E).

FIGURE
FIGUREAverage time consumed by deep networks in our experiment.
come from Set A, B, C, D and E respectively, as shown below.In our experiment, three kinds of EEG signals are employed, namely normal (A and B), interictal (C and D), and ictal (E), to evaluate the proposed epilepsy detection framework.

FIGURE
FIGURE Changes in recognition accuracy along with stacked depth of modules in various scenarios.(A) Another individual group vs. ictal A/E, B/E, C/E, and D/E.(B) Two other groups vs. ictal AB/E, AC/E, AD/E, BC/E, BD/E, and CD/E.(C) Three other groups vs. ictal ABC/E, ABD/E, ACD/E, and BCD/E.(D) All other groups vs. ictal ABCD/E.(E) A/B/E, A/C/E, A/D/E, B/C/E, B/D/E, and C/D/E.(F) A/B/C/D/E.

FIGURE
FIGUREDi erent parameters of depth used in our experiment.

TABLE A brief
introduction to the EEG dataset.
label T remains the same as the original one.Similarly, calculate the actual output Y 2 .Y 2 is transformed into knowledge again, where D is the final value of The best results are marked in bold.Frontiers in Neuroinformaticsfrontiersin.org

TABLE F −
measure scores of the comaparative methods.