ORIGINAL RESEARCH article

Front. Neurosci., 26 September 2022

Sec. Brain Imaging Methods

Volume 16 - 2022 | https://doi.org/10.3389/fnins.2022.982541

Epileptic seizure prediction using successive variational mode decomposition and transformers deep learning network

  • 1. School of Mathematics Science, Liaocheng University, Liaocheng, China

  • 2. School of Computer Science and Technology, Shandong Jianzhu University, Jinan, China

Article metrics

View details

16

Citations

3,6k

Views

1,3k

Downloads

Abstract

As one of the most common neurological disorders, epilepsy causes great physical and psychological damage to the patients. The long-term recurrent and unprovoked seizures make the prediction necessary. In this paper, a novel approach for epileptic seizure prediction based on successive variational mode decomposition (SVMD) and transformers is proposed. SVMD is extended to multidimensional form for time-frequency analysis of multi-channel signals. It could adaptively extract common band-limited intrinsic modes among all channels on different time scales by solving a variational optimization problem. In the proposed seizure prediction method, data are first decomposed into multiple modes on different time scales by multivariate SVMD, and then, irrelevant modes are removed for preprocessing. Finally, power spectrum of denoised data is input to a pre-trained bidirectional encoder representations from transformers (BERTs) for prediction. The BERT could identify the mode information related to epileptic seizures in time-frequency domain. It shows fair prediction performance on an intracranial EEG dataset with the average sensitivity of 0.86 and FPR of 0.18/h.

Introduction

Epilepsy is one of the most common brain diseases that affect people of all ages. The long-term recurrent and unprovoked seizures could cause great damage to physical and mental health of patients (Schulze-Bonhage and Kühn, 2008). An incoming seizure may be inhibited by some interventions such as medication and electrical or magnetic stimulation of the brain, if it is predicted in advance (Elger, 2001). Therefore, accurate prediction of epileptic seizure could not only significantly improve the quality of life for patients, but also provide a basis for the development of more effective methods of prevention and treatment of epilepsy. There are four phases of brain activity for patients: interictal phase (between seizures), preictal phase (prior to seizure), ictal phase (seizure), and postictal phase (after seizure). If the preictal state could be identified from other states, an imminent seizure will be predicted. The primary challenge in seizure prediction is the classification of preictal and interictal states (baseline). Electroencephalogram (EEG) is a method commonly used to diagnose epilepsy and evaluate its therapeutic effect (Fisher et al., 2005). In the recent years, an increasing number of literature demonstrates that there is a pattern in preictal EEG (Usman et al., 2019), and prediction of epileptic seizure by EEG is feasible.

Recently, the methods of seizure prediction have focused on time-frequency analysis, non-linear dynamics, and deep learning network. Common time-frequency analysis methods such as wavelet transform and empirical mode decomposition (EMD) have been applied to obtain EEG modes on different scales for seizure detection and prediction (Zahra et al., 2017; Zhang et al., 2018; Hassan et al., 2020; Savadkoohi et al., 2020). However, wavelet transform is not adaptive and the problems of EMD on low robustness and limited mathematical interpretation need to be improved (Dragomiretskiy and Zosso, 2013). Recently proposed variational mode decomposition (VMD) could separate the non-stationary signal into intrinsic modes with narrow band as well as EMD, but the advantages in complete mathematical theory framework and greater robustness (Dragomiretskiy and Zosso, 2013; Lahmiri, 2015) make it applied increasingly in various fields (Upadhyay and Pachori, 2015; Xue et al., 2016; Zhang et al., 2017; Li et al., 2018; Taran and Bajaj, 2018; Wang et al., 2019; Dora and Biswal, 2020; Guo et al., 2020), including epileptic seizure classification (Rout and Biswal, 2020; Peng et al., 2021). In addition to some statistical features in time domain and power spectral estimation in frequency domain, some non-linear dynamical parameters such as fractal dimension (Aarabi and He, 2017), largest Lyapunov exponent (Fei et al., 2017), fuzzy distribution entropy (Zhang et al., 2018), and Hjorth parameters (Teixeira et al., 2014) were also selected as features. Because it was difficult to describe preictal state with just a few features, many tedious feature engineering techniques were involved in the previous studies. However, some features were a lack of reproducibility and reliability (Mormann et al., 2005, 2007; Assi et al., 2017b). Recently, deep learning networks, including convolutional neural networks (CNN) and long short-term memory (LSTM) networks, have attracted most interest in seizure prediction, as their classification performance of preictal state and interictal state is superior to traditional machine learning techniques (Tsiouris et al., 2018; Usman et al., 2019). The latest bidirectional encoder representations from transformer (BERT) (Lee and Toutanova, 2018) is a very attractive deep learning network, which has made a great progress in the field of natural language processing (NLP). It has demonstrated superior performance over LSTM on many NLP tasks. Its application potential in other time series analysis is worth further exploring.

In this paper, a multidimensional extension of SVMD is proposed to adaptively extract common intrinsic modes among all channels on different time scales. After decomposed by multivariate SVMD, task-independent modes of the data could be removed for preprocessing or denoising. Then, the power spectrum of denoised iEEG data is input to a pre-trained BERT model for seizure prediction. The proposed seizure prediction method works well on two iEEG datasets.

The work is organized as follows. In Materials and methods, we introduce the information of database used in this paper and the proposed scheme, respectively. In addition, method of performance evaluation and seizure prediction are shown in this section. In Results, we present the experiments' results. In Discussion, we discuss the preprocessing method of SVMD and different seizure prediction methods used on the iEEG dataset. Finally, we conclude this paper in Conclusion.

Materials and methods

EEG dataset

The first dataset was obtained from Kaggle American Epilepsy Society Seizure Prediction Challenge (https://www.kaggle.com/competitions/seizure-prediction/). It is comprised of long-term intracranial EEG (iEEG) recordings from five dogs and two patients. Another dataset used in this study is comprised of continuous iEEG recordings from three dogs (Dog_6, Dog_7, and Dog_8), which could be obtained from NIH-sponsored International Epilepsy Electrophysiology portal (https://www.ieeg.org). The Canine iEEG data were sampled from 16 or 15 electrodes at 400 Hz. iEEG data of two patients were sampled at 5,000 Hz and recorded with 15 (Patient_1) and 24 (Patient_2) implanted electrodes, respectively. The type of seizures is focal epilepsy. More details were described in reference (Brinkmann et al., 2016). In this dataset, 1 h before seizure with a 5-min horizon (i.e., 66–5 min before seizure onset) was chosen as preictal phase (Brinkmann et al., 2016; Assi et al., 2017a; Gagliano et al., 2019; Nejedly et al., 2019; Yu et al., 2021). Each consecutive interictal sequence lasted for 1 h, which were randomly chosen from iEEG recordings more than 1 week (dogs) and 4 h (patients) before or after any seizure. The iEEG portal dataset is comprised of continuous iEEG recordings, which are all labeled. The Kaggle dataset consists of training data and testing data. Each labeled iEEG sequence of training data lasts for 1 h, and unlabeled testing data are 10-min iEEG segment (the contest website does not have labels for test data, and the score could only be obtained by uploading the predicted results of all test data to the website). The description of the data used in this work is shown in Table 1.

Table 1

ParticipantNo. ofNo. ofInterictalNo. of
channelsseizureshourstesting segments
Dog_116480502
Dog_2167831,000
Dog_31612240907
Dog_41616134990
Dog_515575191
Dog_61641998-
Dog_71638936-
Dog_81615286-
Patient_11528.3195
Patient_22437150

Description of the Kaggle dataset.

Preprocessing methods

The multidimensional extension of successive variational mode decomposition (SVMD) is proposed for time-frequency analysis of non-stationary multi-channel signals in this section. Multivariate SVMD is used to remove irrelevant modes for denoising in the presented seizure prediction method.

Successive variational mode decomposition is established under the similar theoretical framework as VMD, which requires each extracted mode to be compact around its center frequency and original data to be reconstructed by all modes. However, different from VMD, SVMD could successively decompose each intrinsic mode from a signal without specifying the number of modes in advance. Therefore, there is no complex multi-parameter optimization problem for SVMD. Details of the algorithm could be found in the reference (Nazari and Sakhaei, 2020). As there is also a lot of demand for analyzing multi-channel signals in real-world applications, a simple multivariate extension of SVMD is presented.

Multidimensional SVMD aimed to adaptively extract common intrinsic modes ui(t) with limited bandwidth from multivariate signal f(t) containing C channels, i.e., f(t)=[f1(t), f2(t), …, fc(t)].

where ui(t) = [ui1(t), ui2(t), …, uic(t)], C is the number of channels and L is the number of common modes decomposed by multivariate SVMD.

It is noteworthy that intrinsic modes on the lth scale ul(t) are set to the same central frequency ωl in our model for the purpose of getting common modes of C channels on the same time scale. According to the definition of intrinsic mode function, ui(t) should be limited bandwidth signals, which is the central assumption for mode separation in SVMD. Therefore, the average bandwidth of all modes on the lth time scale should be minimized. Equivalently, the total bandwidth of C modes forms cost function L1 in multivariate SVMD optimization problem and is given by

To obtain the complete modes on the lth scale and avoid mode mixing with other scales, neither the previously extracted l − 1 modes nor undecomposed part fuk(t) of the kth channel (k = 1, 2, …, C) should contain any information of the lth mode. Meanwhile, there should be no spectral overlap between the lth mode and previously decomposed l − 1 modes. Accordingly, criteria L2, namely, the total frequency response of residual signals (and fuk(t)) of all channels after passing through the filter (frequency response of the lth filter), should be minimized. Furthermore, for the kth channel, the total energy of filtered ulk(t) by each filter (i = 1, 2, …l − 1) requires as less as possible. This constraint is shown in the cost function L3.

The constrained variational optimization problem for multivariate SVMD is represented as follows:

The augmented Lagrange function shown in (7) is used to transform this problem into unconstrained optimization problem, which could be solved iteratively by ADMM approach (Bertsekas, 1982)

The first subproblem is focused on updating the modes ulk iteratively by channel. The (n + 1)th iteration of the kth channel could be rewritten as the following equivalent problem, which is actually reduced to a univariate mode update problem in original SVMD.

Therefore, as same as SVMD, it could be solved in spectral domain based on the Parseval's equality. ulk is updated by (9). Details could be found in reference (Nazari and Sakhaei, 2020).

The second subproblem is related to updating the center frequency ωl. The (n + 1)th iteration of each channel is the minimization problem shown in (10), which could be solved with the method and equation applied in SVMD. According to the principle of linear superposition, ωl could be updated by Equation (11).

The updating equation of Lagrange multiplier λ is the same as SVMD, as long as replace ûi by ûik.

The result of decomposition is affected by the penalty factor α, which determines the bandwidth of intrinsic modes (Dragomiretskiy and Zosso, 2013; Nazari and Sakhaei, 2020). Furthermore, the optimal α differs obviously when decomposing different types of signals. Consequently, a heuristic method similar to SVMD is introduced to obviate optimization of α. In the iteration of extracting modes of the lth scale, α is set to grow exponentially from a small value αmin to a maximum allowable value αmax, which is actually a process of finding the strongest modes in the residual signals from coarse to fine tuning.

The algorithm terminates search until total energy of all the lth modes is less than the given threshold ε2; namely, the modes extracted could be regarded as noise. Finally, all the obtained modes are sorted by their center frequency from low to high. The complete algorithm for multivariate SVMD is described in Table 2.

Table 2

The algorithm of SVMD
Initialize: l←0, ,
repeat
ll + 1
Set , , , n←0, m←0, α1←αmin
repeat
mm + 1
repeat
nn + 1
fork = 1 : Cdo
Update ulk for all ω ≥ 0:

end for
fork = 1 : Cdo
Update ωl

end for
fork = 1 : Cdo
Dual Ascent for all ω ≥ 0:


end for
Until convergence:
Set , , , , n←0
Until αm ≤ αmax
Until

The complete algorithm of multivariate SVMD.

Classification and evaluation

The human iEEG data were down-sampled to 500 Hz to be comparable to canine iEEG. To reduce computational burden of SVMD, both preictal and interictal iEEG data were first divided into 2-s clips without overlap. Then, all iEEG clips were decomposed by multivariate SVMD. Irrelevant modes of raw iEEG data were removed and the remaining ones were added up for reconstruction. Subsequently, the reconstructed data were concatenated into a new time series in chronological order. The denoised iEEG data were split into 30-s-long samples with 28-s overlap. To use modal information in time-frequency domain for prediction, power spectrum was extracted by the short-time Fourier transform (STFT). Each iEEG sample was segmented by a 1-s time window with 75% overlap to compute the power spectrum by the function spectrum in MATLAB. Only the power spectrum from 0 to 140 Hz is selected in this study, and the average of the power per 2 Hz is calculated as the final spectrum. The power spectrum of iEEG samples was input to a deep learning network based on BERT for seizure prediction. To compare the performance of preprocessing, the power spectrum of raw iEEG was also input to BERT for classification.

BERT model architecture

The classic BERT's model architecture is based on a multi-layer bidirectional transformer encoder (Vaswani et al., 2017) and it uses bidirectional self-attention mechanism. After being pre-trained with two unsupervised tasks, all parameters of BERT could be fine-tuned using labeled data from the downstream tasks. The code and pre-trained models are available at https://github.com/matlab-deep-learning/transformer-models. In this study, the classification of preictal and interictal iEEG could be considered as a downstream task to finetune a pre-trained BERT model with an additional output layer. Our model architecture consists of input layer, encoder layer (transformer blocks), fully connected layer, and Softmax classification layer, as shown in Figure 1.

Figure 1

Figure 1

The architecture of BERT model.

It is worth noting that BERT is originally designed to solve NLP tasks, and the input representation is a token sequence transformed from a sentence (Wu et al., 2016). However, the input data are essentially a digital time series, which is unnecessary to convert to tokens and then use word embedding in the input layer. Therefore, a more suitable embedding method for digital sequence needs to be designed. The input data of all channels are concatenated and weighted as a kind of data embedding [refer to Equations (12) and (13)], which could be considered as a kind of data fusion.

where is power spectrum of the ith channel in the jth time window (each 1-s time window is set as a time step, and N is the number of time steps), and all channels are cascaded to construct a (Nc × Np) × 1 vector (Nc is the number of channels, and Np is the number of spectrum frequencies). The Hadamard product of the power spectrum X and weight matrix W is the data embedding Ed.

In the input layer, X is converted to a matrix E by summing the position embedding Ep (the embedding method is the same as BERT) and data embedding Ed. Dynamic coding is applied and all weights are automatically learned by training. Weights are first initialized as random numbers that obey normal distribution. After embedding and normalization, the (Nc × Np) × N (i.e., number of features × number of time steps) matrix is input to the encoder layer.

In the encoder layer, the number of layers (i.e., transformer blocks) is 12 and the hidden size is 768. The number of self-attention heads is 12. Batch size is set to 32 and the number of epochs in training loop is 10. The BERT model is built with MATLAB R2022a.

Evaluation

To test the predictive ability of this approach for unknown seizures, limited seizures were used for training, whereas the remaining ones were for testing. All the data of iEEG portal dataset and labeled training data in Kaggle dataset could be used. Because there were relatively few seizures for each subject in the training data, a leave-one-out cross-validation method was applied. Namely, M-1 seizures were used for training and one for validation if there were M seizures for a subject. The amount of interictal iEEG is much larger than preictal iEEG. Therefore, to avoid the problem of class imbalance, a number of preictal and interictal iEEG sequences were the same in the training set. Each interictal iEEG sequence was randomly selected from the dataset. All remaining interictal sequences were used for validation. We run ten trials and train 10 models for each subject (refer to Lian et al., 2020). The average performance was considered as final prediction performance when using training data. We could also use unlabeled testing data of Kaggle dataset to test the prediction method. Similarly, we trained multiple models to avoid the problem of class imbalance. For each subject, all preictal iEEG and the same amount of randomly selected interictal iEEG were used to train the model, and we run 10 trials. A testing segment in Kaggle dataset would be predicted as preictal iEEG, if more than 6 models identified it as preictal. No labels are given for the testing data in Kaggle dataset, but the score (an index related to classification accuracy that used by the organizer) could be calculated on the competition website. Therefore, the score we achieved on testing data is a key indicator of predictor performance.

To improve the reliability of the prediction, a prediction window of 10 min was applied. According to experiential knowledge [refer to (Truong et al., 2018; Wei et al., 2019)], if more than 60% of EEG samples during 10-min continuous recordings are identified as preictal, the warning alarm would be raised. To evaluate the performance of the prediction method, there are four commonly used measures including sensitivity, false prediction rate (FPR), seizure occurrence period (SOP), and seizure prediction horizon (SPH). Sensitivity is the number of correctly predicted seizures divided by the total number of seizures. FPR is defined as the number of false alarms per hour. SPH is a predefined interval between the first alarm and the incoming seizure, which is also a period reserved for patients to take intervention measures. SOP is the period during which a seizure is expected to occur (Maiwald et al., 2004). Therefore, for a correct prediction, seizure would not occur during the SPH and must occur within the SOP. There are no common criteria for the length of SOP and SPH, but the SPH should be long enough for intervention and the SOP should not be too long in case of patient's anxiety. Based on prior knowledge of other studies, we use the SPH of 30 min and the SOP of 20 min here.

To evaluate the statistical significance of the seizure prediction performance, a random predictor is used for comparison. For a given FPR, the probability to raise an alarm during the SOP can be approximated as follows: (Schelter et al., 2006).

Therefore, the probability of predicting at least m of M independent seizures by chance is given by

For each patient, p is calculated using the FPR and the number of correctly predicted seizures m. If p is < 0.05, the prediction method is considered significantly better than a random predictor at a significance level of 0.05.

Results

Preprocessing results

Multivariate SVMD was applied for preprocessing. The range of parameter α in SVMD was set to [200, 800] for canine iEEG and [200, 2000] for patient iEEG. The eight scales of common intrinsic modes extracted from a randomly selected 2-s preictal iEEG of Dog_5 are shown in Figure 2A (only the first 3 channels are displayed in the figure due to space limitations). Corresponding power spectrum density (PSD) of all 15 channels on each scale is indicated in Figure 2B. The frequency bands of modes on the same scale were similar, which illustrated the mode-alignment ability of multivariate SVMD across multiple channels. The modes which could obtain the highest classification accuracy were considered as effective modes and the others were irrelevant. Irrelevant modes were removed and the remaining modes were added up for reconstruction. It could be considered a kind of denoising.

Figure 2

Figure 2

(A) The eight scales of modes extracted from a randomly selected preictal iEEG sample by multivariate SVMD (only the first 3 channels are displayed) and (B) PSD of all 15 channels on each scale.

Prediction results

The power spectrum of reconstructed data was input to BERT for deep learning and classification. It is shown in Table 3 that this prediction algorithm achieves mean sensitivity of 0.86 and the average FPR of 0.18/h. The p-value indicated that the prediction method was significantly superior to a random predictor for all subjects. The mean score of our method on testing data of Kaggle dataset was 0.84125, which was about 0.03 below the competition leader of 0.87154. The power spectrum of raw iEEG was also input to BERT for classification, to compare the preprocessing algorithms. The mean score on testing data was 0.69153, which is, however, much lower than the proposed method.

Table 3

ParticipantNo. of seizuresCenter frequency of removed modes (Hz)SensitivityFPR (/h)p
Dog_148 < ω < 150.650.250.0019
Dog_2712 < ω < 200.870.06< 0.001
Dog_31212 < ω < 200.920.23< 0.001
Dog_416ω < 600.940.07< 0.001
Dog_55ω < 550.900.16< 0.001
Dog_641ω < 300.900.15< 0.001
Dog_738ω < 300.860.18< 0.001
Dog_815ω < 200.880.09< 0.001
Patient_12ω < 5510.360.0128
Patient_23ω < 300.670.250.0182
Mean0.860.18

The performance of the proposed method on 10 subjects.

Discussion

Multivariate SVMD inherits the advantages of SVMD including less parameters, resistance to mode mixing and adaptability. Meanwhile, it could be seen from Figure 3 that the frequency bands of modes on the same scale were similar, which illustrated the mode-alignment ability of multivariate SVMD across multiple channels. Furthermore, modes in different scales were in distinctive frequency bands, which demonstrated that SVMD might have filter bank property, which is not the focus of this study, but could be further proofed in the future.

Figure 3

Figure 3

The distribution of the number of time scales (upper) and range of center frequency on 8 dominant scales (lower) in (A) interictal and (B) preictal states for Dog_5.

The number of the intrinsic modes extracted by multivariate SVMD for some samples was not consistent, because of wideband iEEG signals with the effect of ocular artifacts, electromyogram, and other background noise. Take the data of Dog_5 for example, the distribution of the number of time scales and center frequency on 8 dominant scales are displayed in Figure 3. Most of the interictal samples (75.6%) were decomposed into 8 scales of band-limited intrinsic mode function (BIMF), whereas there was less consistency for preictal samples on the number of modes. For both states, the center frequencies of 8 dominant time scales were in the range of [0, 8], [8, 18], [18, 32], [32, 42], [42, 53], [53, 65], [65, 80], and [80, 110] respectively. However, the proportion of samples containing certain time scales of modes (modes in high gamma band) is significantly reduced in the preictal state, as shown in Figure 3. The reason might be that some modes were interfered by the new modes generated by an impending epileptic seizure, which needs to be proved by exploiting more physiological evidence.

It can be seen from Table 3 that the difference of preictal and interictal modes shows specificity among all subjects. There is a certain consistency for Dog_1, Dog_2, and Dog_3, because all the irrelevant modes are in alpha and beta bands. However, modes that are associated with seizures are in gamma band for other subjects. Therefore, the seizure prediction method is patient-dependent due to the specificity of patients.

As summarized in the reference (Usman et al., 2019), support vector machine (SVM) was widely used in studies before 2019 with good predictive performance. LSTM was the most commonly used model among deep learning models to solve NLP problems and other time series pattern recognition before the emergence of BERT. Therefore, we compare the prediction ability of these two classifiers with BERT. SVM with Gaussian radial basis function (RBF) kernel is used by reference to the literature (Bandarabadi et al., 2015; Xiang et al., 2015; Sharif and Jafari, 2017). The LSTM network is consisted of a sequence input layer, a LSTM layer, a dropout layer, a fully connected layer using the “relu” activation function, and a classification layer using the “softmax” activation. The size of input layer is dependent on the number of power spectrum features. The dropout probability is 0.5. The number of memory units on the LSTM layer is set to 128 (Tsiouris et al., 2018). Although the mean sensitivity of SVM could reach 0.83, the score on testing data is only 0.65839. The prediction performance of both LSTM and BERT on testing data is much better than that of SVM, which may due to the stronger learning ability of the two deep learning models for temporal information. Moreover, BERT could achieve better prediction results than LSTM, as shown in Table 4. It illustrates that BERT shows better performance in epileptic seizure prediction than LSTM.

Table 4

Training dataTesting data
SensitivityFPRscore
SVM0.830.240.65839
LSTM0.840.210.77930
BERT0.860.180.84125

The prediction performance of three classifiers (SVM, LSTM, and BERT).

The sensitivity and FPR in the table are the average of 10 subjects.

As is shown in Table 5, for the canine iEEG dataset, the sensitivity of this method is higher than that of other methods, and the FPR of 0.20 is relatively low. It represents the high prediction performance of this method. The mean score achieved on testing data was 0.84125 with preprocessed data, while only 0.69153 with raw iEEG data, which illustrates that SVMD could screen out valid modes for seizure prediction. Meanwhile, it proved again that the difference between preictal state and interictal state of brain exists in the power spectrum of iEEG in time-frequency domain. The self-attention learning mechanism of BERT could extract the information effectively. Although the result is comparable with the work of Assi et al. there were only three subjects and complex feature extraction, and feature selection and channel selection were used to predict seizures in that study. However, there are 10 subjects in the two datasets we used, and our method is relatively simple. Only the power spectrum of denoised iEEG by SVMD is used as features for prediction.

Table 5

AuthorsNo. of subjectsFeaturesClassifierSensitivityFPR
Assi et al. (2017a)3Spectral band power, Hjorth mobility and complexity, spectral edge frequency and power, and decorrelation timeSVM0.85-
Truong et al. (2018)7Short-time Fourier transformCNN0.750.21
Nejedly et al. (2019)4Raw iEEG and spectrogram imagesCNN0.79-
Gagliano et al. (2019)3Higher-order spectral featuresLSTM0.78-
Yu et al. (2021)7Autoregressive (AR) model coefficients and Laguerre–Volterra AR model coefficientsSparse lasso logistic regression classifier0.78-
This work10Power spectrum of reconstructed data by multivariate SVMDBERT0.860.18

Comparison of seizures prediction methods using iEEG dataset.

The mean score obtained by the first team is 0.87154 in the Kaggle competition. Although it is about 0.03 higher than our method, their result is based on the numerous features and elaborate feature selection. The features include energy in different frequency bands, correlation of energy between channels, square root of each feature, and so on (they only briefly introduced the features in the following websites: https://www.kaggle.com/competitions/seizure-prediction/discussion/11024). However, features are learned adaptively in our method. The score we achieved indicates that the proposed approach could be a candidate or auxiliary method for seizure prediction.

Conclusion

In this paper, we proposed a seizure prediction method based on SVMD and BERT. The simple extension of SVMD could decompose multivariate data into its common inherent modes on different scales. The iEEG signals were preprocessed by removing irrelevant modes after decomposition by SVMD. The prediction score on Kaggle competition indicated that BERT could learn the difference of preictal and interictal state in time-frequency domain using the self-attention learning mechanism. Therefore, it could be a candidate method for seizure prediction.

Funding

This work was partly supported by the National Natural Science Foundation of China (Nos. 61976110, 62176112, and 11931008), the Natural Science Foundation of Shandong Province (No. ZR202102270451), and The Open Project of Liaocheng University Animal Husbandry Discipline (No. 319312101-01).

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Statements

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

XW and TZ designed the study. XW downloaded and analyzed the data, performed experiments, and drafted the manuscript. LZ, LQ, and TZ revised the manuscript. All authors read and approved the final manuscript.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  • 1

    AarabiA.HeB. (2017). Seizure prediction in patients with focal hippocampal epilepsy. Clin. Neurophysiol. 128, 12991307. 10.1016/j.clinph.2017.04.026

  • 2

    AssiE. B.NguyenD. K.RihanaS.SawanM. (2017a). A functional-genetic scheme for seizure forecasting in canine epilepsy. IEEE Trans. Biomed. Eng.65, 13391348. 10.1109/TBME.2017.2752081

  • 3

    AssiE. B.NguyenD. K.RihanaS.SawanM. (2017b). Towards accurate prediction of epileptic seizures: a review. Biomed. Signal Process. Control. 34, 144157. 10.1016/j.bspc.2017.02.001

  • 4

    BandarabadiM.TeixeiraC. A.RasekhiJ.DouradoA. (2015). Epileptic seizure prediction using relative spectral power features. Clin. Neurophysiol. 126, 237248. 10.1016/j.clinph.2014.05.022

  • 5

    BertsekasD. P. (1982). Constrained Optimization and Lagrange Multiplier Methods (Constrained Optimization and Lagrange Multiplier Methods) (Athena Scientific). Available online at: https://www.amazon.com/Constrained-Optimization-Lagrange-Multiplier-computation/dp/1886529043

  • 6

    BrinkmannB. H.WagenaarJ.AbbotD.AdkinsP.BosshardS. C.ChenM.et al. (2016). Crowdsourcing reproducible seizure forecasting in human and canine epilepsy. Brain139, 17131722. 10.1093/brain/aww045

  • 7

    DoraC.BiswalP. K. (2020). An improved algorithm for efficient ocular artifact suppression from frontal EEG electrodes using VMD. Biocybern. Biomed. Eng.40, 148161. 10.1016/j.bbe.2019.03.002

  • 8

    DragomiretskiyK.ZossoD. (2013). Variational mode decomposition. IEEE Trans. Signal Process. 62, 531544. 10.1109/TSP.2013.2288675

  • 9

    ElgerC. E. (2001). Future trends in epileptology. Curr. Opin. Neurol. 14, 185186. 10.1097/00019052-200104000-00008

  • 10

    FeiK.WangW.YangQ.TangS. (2017). Chaos feature study in fractional fourier domain for preictal prediction of epileptic seizure. Neurocomputing249, 290298. 10.1016/j.neucom.2017.04.019

  • 11

    FisherR. S.BoasW. V. E.BlumeW.ElgerC.GentonP.LeeP.et al. (2005). Epileptic seizures and epilepsy: definitions proposed by the international league against epilepsy (ILAE) and the international bureau for epilepsy (IBE). Epilepsia46, 470472. 10.1111/j.0013-9580.2005.66104.x

  • 12

    GaglianoL.Bou AssiE.NguyenD. K.SawanM. (2019). Bispectrum and recurrent neural networks: improved classification of interictal and preictal states. Sci. Rep. 9, 19. 10.1038/s41598-019-52152-2

  • 13

    GuoZ.LiuM.WangY.QinH. (2020). A new fault diagnosis classifier for rolling bearing united multi-scale permutation entropy optimize VMD and cuckoo search SVM. IEEE Access. 8, 153610153629. 10.1109/ACCESS.2020.3018320

  • 14

    HassanA. R.SubasiA.ZhangY. (2020). Epilepsy seizure detection using complete ensemble empirical mode decomposition with adaptive noise. Knowl. Based Syst. 191, 105333. 10.1016/j.knosys.2019.105333

  • 15

    LahmiriS. (2015). Comparing variational and empirical mode decomposition in forecasting day-ahead energy prices. IEEE Syst. J. 11, 19071910. 10.1109/JSYST.2015.2487339

  • 16

    LeeJ. D. M. C. K.ToutanovaK. (2018). Pre-Training of Deep Bidirectional Transformers for Language Understanding. Available online at: https://arxiv.org/abs/1810.04805?_hsenc=p2ANqtz–n7PUYWznWMz86GjLjA-LJx8Oyt7ZwXl1kdSGc1BMUWkEnTdj39QK1wTM4ynwo4sZqObOi

  • 17

    LiF.ZhangB.VermaS.MarfurtK. J. (2018). Seismic signal denoising using thresholded variational mode decomposition. Explor. Geophys. 49, 450461. 10.1071/EG17004

  • 18

    LianQ.QiY.PanG.WangY. (2020). Learning graph in graph convolutional neural networks for robust seizure prediction. J. Neural Eng. 17, 035004. 10.1088/1741-2552/ab909d

  • 19

    MaiwaldT.WinterhalderM.Aschenbrenner-ScheibeR.VossH. U.Schulze-BonhageA.TimmerJ. (2004). Comparison of three nonlinear seizure prediction methods by means of the seizure prediction characteristic. Physica D194, 357368. 10.1016/j.physd.2004.02.013

  • 20

    MormannF.AndrzejakR. G.ElgerC. E.LehnertzK. (2007). Seizure prediction: the long and winding road. Brain130, 314333. 10.1093/brain/awl241

  • 21

    MormannF.KreuzT.RiekeC.AndrzejakR. G.KraskovA.DavidP.et al. (2005). On the predictability of epileptic seizures. Clin. Neurophysiol. 116, 569587. 10.1016/j.clinph.2004.08.025

  • 22

    NazariM.SakhaeiS. M. (2020). Successive variational mode decomposition. Signal Process.174, 107610. 10.1016/j.sigpro.2020.107610

  • 23

    NejedlyP.KremenV.SladkyV.NasseriM.GuragainH.KlimesP.et al. (2019). Deep-learning for seizure forecasting in canines with epilepsy. J. Neural Eng.16, 036031. 10.1088/1741-2552/ab172d

  • 24

    PengJ.Xue-JunZ.Zhi-XinS. (2021). eEpileptic electroencephalogram signal classification method based on elastic variational mode decomposition. Acta Physica Sinica. 70:018702-018702. 10.7498/aps.70.20200904

  • 25

    RoutS. K.BiswalP. K. (2020). An efficient error-minimized random vector functional link network for epileptic seizure classification using VMD. Biomed. Signal Process. Control. 57, 101787. 10.1016/j.bspc.2019.101787

  • 26

    SavadkoohiM.OladunniT.ThompsonL. (2020). A machine learning approach to epileptic seizure prediction using Electroencephalogram (EEG) Signal. Biocybern Biomed. Eng.40, 13281341. 10.1016/j.bbe.2020.07.004

  • 27

    SchelterB. R.WinterhalderM.MaiwaldT.BrandtA.SchadA.Schulze-BonhageA.et al. (2006). Testing statistical significance of multivariate time series analysis techniques for epileptic seizure prediction. Chaos16, 1321. 10.1063/1.2137623

  • 28

    Schulze-BonhageA.KühnA. (2008). Unpredictability of seizures and the burden of epilepsy. Seizure Prediction Epilepsy. 1–10. 10.1002/9783527625192.ch1

  • 29

    SharifB.JafariA. H. (2017). Prediction of epileptic seizures from EEG using analysis of ictal rules on poincaré plane. Comput. Methods Programs Biomed. 145, 1122. 10.1016/j.cmpb.2017.04.001

  • 30

    TaranS.BajajV. (2018). Clustering variational mode decomposition for identification of focal EEG signals. IEEE Sens. Lett. 2, 14. 10.1109/LSENS.2018.2872415

  • 31

    TeixeiraC. A.DireitoB.BandarabadiM.Le Van QuyenM.ValderramaM.SchelterB.et al. (2014). Epileptic seizure predictors based on computational intelligence techniques: a comparative study with 278 patients. Comput. Methods Programs Biomed.114, 324336. 10.1016/j.cmpb.2014.02.007

  • 32

    TruongN. D.NguyenA. D.KuhlmannL.BonyadiM. R.YangJ.IppolitoS.et al. (2018). Convolutional neural networks for seizure prediction using intracranial and scalp electroencephalogram. Neural Net.105, 104111. 10.1016/j.neunet.2018.04.018

  • 33

    TsiourisK. M.PezoulasV. C.ZervakisM.KonitsiotisS.KoutsourisD. D.FotiadisD. I. (2018). A long short-term memory deep learning network for the prediction of epileptic seizures using EEG signals. Comput. Biol. Med. 99, 2437. 10.1016/j.compbiomed.2018.05.019

  • 34

    UpadhyayA.PachoriR. B. (2015). Instantaneous voiced/non-voiced detection in speech signals based on variational mode decomposition. J. Franklin Inst. 352, 26792707. 10.1016/j.jfranklin.2015.04.001

  • 35

    UsmanS. M.KhalidS.AkhtarR.BortolottoZ.BashirZ.QiuH. (2019). Using scalp EEG and intracranial EEG signals for predicting epileptic seizures: review of available methodologies. Seizure71, 258269. 10.1016/j.seizure.2019.08.006

  • 36

    VaswaniA.ShazeerN.ParmarN.UszkoreitJ.JonesL.GomezA. N.et al (2017). Attention is all you need. Adv. Neural Inf. Process. Syst. 10.48550/arXiv.1810.04805. Available online at: https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

  • 37

    WangC.LiH.HuangG.OuJ. (2019). Early fault diagnosis for planetary gearbox based on adaptive parameter optimized VMD and singular kurtosis difference spectrum. IEEE Access. 7, 3150131516. 10.1109/ACCESS.2019.2903204

  • 38

    WeiX.ZhouL.ZhangZ.ChenZ.ZhouY. (2019). Early prediction of epileptic seizures using a long-term recurrent convolutional network. J. Neurosci. Methods327, 108395. 10.1016/j.jneumeth.2019.108395

  • 39

    WuY.SchusterM.ChenZ.LeQ. V.NorouziM.MachereyW.et al (2016). Google's Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation. Available online at: https://arxiv.org/abs/1609.08144

  • 40

    XiangJ.LiC.LiH.CaoR.WangB.HanX.et al. (2015). The detection of epileptic seizure signals based on fuzzy entropy. J. Neurosci. Methods. 243, 1825. 10.1016/j.jneumeth.2015.01.015

  • 41

    XueY.-J.CaoJ.-X.WangD.-X.DuH.-K.YaoY. (2016). Application of the variational-mode decomposition for seismic time–frequency analysis. IEEE. J. Sel. Top Appl. Earth Obs. Remote Sens. 9, 38213831. 10.1109/JSTARS.2016.2529702

  • 42

    YuP.-,g.LiuC. Y.HeckC. N.BergerT. W.SongD. (2021). A sparse multiscale nonlinear autoregressive model for seizure prediction. J. Neural Eng. 18, 026012. 10.1088/1741-2552/abdd43

  • 43

    ZahraA.KanwalN.ur RehmanN.EhsanS.McDonald-MaierK. D. (2017). Seizure detection from EEG signals using multivariate empirical mode decomposition. Comput. Biol. Med. 88, 132141. 10.1016/j.compbiomed.2017.07.010

  • 44

    ZhangM.JiangZ.FengK. (2017). Research on variational mode decomposition in rolling bearings fault diagnosis of the multistage centrifugal pump. Mech. Syst. Signal Process. 93, 460493. 10.1016/j.ymssp.2017.02.013

  • 45

    ZhangT.ChenW.LiM. (2018). Fuzzy distribution entropy and its application in automated seizure detection technique. Biomed. Signal Process. Control. 39, 360377. 10.1016/j.bspc.2017.08.013

Summary

Keywords

seizure prediction, successive variational mode decomposition, multiscale time-frequency analysis, BERT, intracranial EEG

Citation

Wu X, Zhang T, Zhang L and Qiao L (2022) Epileptic seizure prediction using successive variational mode decomposition and transformers deep learning network. Front. Neurosci. 16:982541. doi: 10.3389/fnins.2022.982541

Received

30 June 2022

Accepted

24 August 2022

Published

26 September 2022

Volume

16 - 2022

Edited by

Xi Jiang, University of Electronic Science and Technology of China, China

Reviewed by

Lu Zhang, University of Texas at Arlington, United States; Lin Zhao, University of Georgia, United States

Updates

Copyright

*Correspondence: Tinglin Zhang Limei Zhang

This article was submitted to Brain Imaging Methods, a section of the journal Frontiers in Neuroscience

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics