Epileptic focus localization using transfer learning on multi-modal EEG

The standard treatments for epilepsy are drug therapy and surgical resection. However, around 1/3 of patients with intractable epilepsy are drug-resistant, requiring surgical resection of the epileptic focus. To address the issue of drug-resistant epileptic focus localization, we have proposed a transfer learning method on multi-modal EEG (iEEG and sEEG). A 10-fold cross-validation approach was applied to validate the performance of the pre-trained model on the Bern-Barcelona and Bonn datasets, achieving accuracy rates of 94.50 and 97.50%, respectively. The experimental results have demonstrated that the pre-trained model outperforms the competitive state-of-the-art baselines in terms of accuracy, sensitivity, and negative predictive value. Furthermore, we fine-tuned our pre-trained model using the epilepsy dataset from Chongqing Medical University and tested it using the leave-one-out cross-validation method, obtaining an impressive average accuracy of 90.15%. This method shows significant feature differences between epileptic and non-epileptic channels. By extracting data features using neural networks, accurate classification of epileptic and non-epileptic channels can be achieved. Therefore, the superior performance of the model has demonstrated that the proposed method is highly effective for localizing epileptic focus and can aid physicians in clinical localization diagnosis.


Introduction
Epilepsy is a worldwide nervous system disease caused by sudden abnormal discharges of nerve cells in the brain.According to statistics, 70 million people worldwide suffer from epilepsy.Clinical manifestations of epileptic seizures include impaired consciousness, limb spasms, urinary incontinence, frothing, and other symptoms.Although short-term epileptic seizures have minimal impact, frequent long-term seizures severely affect patients' physical, mental and intellectual health (Kwan and Brodie, 2000;Rakhade and Jensen, 2009;Rasheed et al., 2021).
The characteristics of EEG (electroencephalogram) data during epileptic seizure period are related to the original localization and the cause of epilepsy.Different nervous system diseases or brain conditions can cause various epileptic seizures (Babb et al., 1987;Fisher et al., 2017).In the treatment of epilepsy, around 1/3 of patients with intractable epilepsy are drug-resistant.Therefore, precise localization of the epileptic focus during presurgical assessment is necessary for the successful resection of epileptic focus.Yang et al. 10.3389/fncom.2023.1294770Frontiers in Computational Neuroscience 02 frontiersin.org There are four clinical methods for epileptic focus localization, including observing clinical symptoms, analyzing fMRI (functional magnetic resonance imaging) data, examining sEEG (scalp electroencephalogram) signal, and studying iEEG (intracranial electroencephalogram) signal.Each method has its advantages and limitations.Observing clinical symptoms is the most direct method but can only localize the functional brain areas.Analyzing fMRI data is expensive and has low temporal resolution.Moreover, if the seizures of epileptic patients are not caused by structural brain lesions, this method will not be able to accurately localize the epileptic focus (Morgan et al., 2004;Stufflebeam et al., 2011;Zhang et al., 2015).Examining sEEG signal is widely used in the detection and prediction of epilepsy (Zhang et al., 2021;Wan et al., 2023a;Yang et al., 2023a,b).This method is non-invasive and has high temporal resolution, but requires expert judgement with a long period of time and the judgement by different physicians may vary.Furthermore, electrodes are implanted in the appropriate target areas of the brain for iEEG signal acquisition and analysis, which is costly, complex, and carries a risk of infection, etc.
Patient-independent methods, which involve joint training with data from multiple patients, face challenges in eliminating significant differences between patients (mainly caused by multiple factors such as physical condition, pathogenesis, seizure intensity, seizure type, etc.).Moreover, the sEEG and iEEG signals are multi-modal data with significantly different characteristics.sEEG, or scalp EEG, is severely attenuated by the skull, leading to signals that are not an accurate representation of the region due to volume conduction effects.iEEG, or intracranial EEG, offers high quality signals that truly reflect the activity of the region.Combining the advantages of sEEG and iEEG data offers a promising approach for epileptic focus localization.
The main contributions of our study can be summarized as follows: (1) In the pre-trained model, the style-feature randomization module and the domain adversarial network were introduced to enhance the generalization ability of the model, and achieving the optimal test results on the Bern-Barcelona dataset and the Bonn dataset; (2) We have proposed a novel transfer learning method for epileptic focus localization, which can make use of the Bern-Barcelona dataset to pre-train the model.Then, we fine-tuned this pre-trained model with the epilepsy dataset from Chongqing Medical University, and conducted sufficient experiments to validate the practical applicability value of our method.

Related works
So far, a number of epileptic focus localization technologies have been developed, primarily transforming the epileptic focus localization problem into a classification task.For example, Chen et al. (2017) used discrete wavelet transform (DWT) to extract feature metrics such as Max, Min, Mean, STD, Skewness of wavelet coefficients at all levels, achieving an accuracy of 83.07% on sym6 wavelet coefficients.Daoud and Bayoumi (2020) proposed a method based on semi-supervised learning, achieving an accuracy of 93.21% on the Bern-Barcelona dataset.Zhao et al. (2020) extracted the entropy features of six frequency bands, used STFT to extract time-frequency features for EEG, and combined two features into a CNN network for feature extraction and classification, achieving an accuracy of 88.77%.Zhao et al. (2021) combined entropy, STFT, and 1D-CNN, achieving an accuracy of 93.44%.Sui et al. (2021) proposed TF-HybridNet, incorporating a 1D convolutional network and STFT for time-frequency feature extraction, achieving an accuracy of 94.3%.
In addition, the characteristics of EEG signal offer valuable information for the localization of epileptic focus.Staljanssens et al. (2017) used brain functional connectivity metrics to calculate weighted adaptive orientation transfer functions, achieving an accuracy of 88.6% on the University Hospital of Geneva epilepsy dataset.Amirsalar et al. (2019) used the Pierson correlation coefficient between signals in each lead to calculate the mean number of connections and connection strength, finally achieving a sensitivity of 80% on the Karunya University EEG dataset.Gunnarsdottir et al. (2022) proposed an algorithm to identify two groups of nodes ("sources" node and "sinks" node) in a resting-state iEEG network.They validated the SSI (source-sink index) in a retrospective analysis of 65 patients, achieving an accuracy of 79%.
The analysis shows that the existing method has the following disadvantages: (1) The Bern-Barcelona dataset and the Bonn dataset only contain channel category information.(1) Therefore, they transformed the localization problem into a classification task, which does not achieve accurate epileptic focus localization; (2) The Bern-Barcelona dataset contains only five patients, and the existing literature does not consider the negative impact of multi-patient differences; (3) The dataset for epileptic focus localization is limited and the accuracy of epileptic focus localization is low.
Therefore, a method with low cost and high detection accuracy is needed to solve the above issues.

EEG data
In this study, we utilized three datasets, including the Bern-Barcelona dataset, the Bonn dataset and the Chongqing Medical University Epilepsy dataset.The Bern-Barcelona and the Bonn datasets were used for pre-training and model performance evaluation.The parameters of the pre-trained model were obtained by training with the Bern-Barcelona dataset, and the Chongqing Medical University epilepsy dataset was used for fine-tuning and testing.

Bern-Barcelona dataset
Recordings from Department of Neurology, University of Bern, Switzerland were used as the first iEEG dataset in this study.To the best of our knowledge, this is the only open dataset that provides clear annotation on focal and non-focal signals during seizure-free periods (Ralph et al., 2012), including data from five patients with drugresistant temporal lobe epilepsy and being the candidates of epilepsy surgery.The dataset contains 7,500 focal samples and 7,500 non-focal samples, each lasting 20 s with a sampling rate of 512 Hz, signals being filtered by a fourth-orders Butterworth bandpass filter with cutoff frequency at 0.5 and 150 Hz.

Bonn dataset
The second iEEG dataset used in this study, obtained from the Epileptology Department of Bonn University (Andrzejak et al., 2001), consists of five sets of EEG recordings labeled A to E. Each set consists of data from five subjects.Set A represents healthy subjects with open eyes.Set B is recorded from healthy subjects with closed eyes.Set C is recorded from non-epileptogenic zone of the epileptic patients' brain, while Set D is recorded from epileptogenic zone.Lastly, Set E represents epileptic patients during ictal period.Each set contains a total of 100 EEG segments.Each segment is 23.6 s long with a sampling rate of 173.61 Hz.The iEEG signals were filtered using fourth-order Butterworth bandpass filter with cutoff frequency at 0.5 and 85 Hz.In this study, we focus on set C and D as they represent the non-focal and focal iEEG signals, respectively.

Chongqing Medical University epilepsy dataset
The third sEEG dataset used in this study was obtained from Chongqing Medical University, including data from six patients.To expand the sample size, we selected patients with multiple seizures.The dataset comprises 16 channels.Each sample is 20 s long with a sampling rate of 512 Hz, filtered by a fourth-order Butterworth bandpass filter with cutoff frequencies at 0.5 and 150 Hz.Details of the dataset are given in Table 1.

Methodology
Due to the large amount of data in Bern-Barcelona dataset, which contains 7,500 focal data and 7,500 non-focal data, we have proposed a transfer learning method to make full use of the large amount of data during the pre-trained model phase.In this approach, we utilize the CQMUE dataset to fine-tune and test the model.Notably, the Bern-Barcelona dataset includes data from five epilepsy patients, with significant differences (mainly due to multiple factors such as physical condition, pathogenesis, seizure intensity, and seizure type).If we train the model directly with multi-patient data, it will quickly lead to model underfitting.To address this issue, we implemented a stylefeature randomization block, a multi-level temporal-spectral feature extraction network, and a domain adversarial network to enhance the generalization ability of the pre-trained model.

Pre-trained model
The pre-trained model consists of an embedding block, a stylefeature randomization block, a multi-level temporal-spectral feature extraction network (Hu et al., 2018;Li et al., 2020;Wan et al., 2023b), a category classifier, and a patient discriminator, as illustrated in Figure 1.The embedding block extends the data across multiple channels to enhance the discriminative properties.The style-feature randomization module disrupts data features within a training batch, enhancing the generalization ability of the model.The multi-level temporal-spectral feature extraction network utilizes temporalspectral features to enhance feature discrimination.The category classifier completes the classification of the data.The patient discriminator employs DANN (domain-adversarial training of neural networks; Yaroslav et al., 2016) to extract the essential data features.

Embedding block
Before the data is fed into the embedding block, necessary data preprocessing is required (Liu et al., 2015;Versaci et al., 2022).The embedding block, i.e., successive temporal convolution and batch normalization (BN) operations, was initially employed to derive an optimal filter band for subsequent analysis [since convolution operators are essentially equivalent to a low-pass filter (Azimi et al., 2019)].As a result, after stacking the original data and the output embeddings with a channel-wise concatenation function, the embedding block obtained sub-band matrices that provided a subsequent network with adaptive sub-band responses and actual data.Finally, the data was fed into the multi-level temporal-spectral feature extraction module for feature extraction.

Style-feature randomization
Within a training batch, the sub-band matrices are computed by the embedding block.Due to significant style-feature differences between the data of each sub-band for different patients, an enhancement of the model's generalization ability is necessary.To achieve this, we computed P nc x and G nc x across spatial dimensions independently for each sub-band (Oren et al., 2021). (1) where , N represents the batch size, C represents the number of channels, H represents the height of the data matrix, and W represents the width of the data matrix.x nchw represents an element in the data matrix, P nc x and G nc x represent the mean and standard deviation for each sub-band.Then, we randomly disrupt P nc x and G nc x to obtain P x and V x , ′ x is obtained by the following equation finally.
where x is the sub-band matrix obtained by the embedding block.

Multi-level temporal-spectral feature extraction network
To prevent deformation of the boundary data caused by zero padding in the convolution operation, the head and tail of the data are filled according to Eq. ( 4) (Li et al., 2020): where | is a concatenating operator, x i represents the i-th element of input x, and Rrepresents the parameter kernel size in the convolution operation.
To expedite calculation time, the proposed method adopted convolution operation to perform multi-level wavelet decomposition, which is defined as follows (Li et al., 2020): where ⊗ is the convolution operation, g and h represent a pair of scaling and wavelet filter, s represents the parameter stride in the convolution operation, y i A represents the approximation (low pass) coefficients, and y i D represents the detail (high pass) coefficients.
In the multi-level temporal feature extraction module, we adopted five independent convolution, batch normalization, and empirical linear unit (ELU) operations to capture multi-level temporal feature information within different perceptual domains.The convolution kernel size is set to [S, 1], where the value of S is {k k k k k k , , / , / , / , / 2 4 8 16 }, k = 2 6 , and ultimately, the temporary features ( f t1 , f t 2 , f t3 , f t 4 , f t5 , f t6 ) are derived.
To further extract discriminative feature information, the features extracted by the multi-level temporal feature extraction module and the multi-level spectral feature extraction module were combined according to the feature dimensions: The combined features f all were fed into a multi-level squeezeand-extinction network (Hu et al., 2018) to enhance discriminability of features.

Category classifier
For the category classifier, the method utilized data from each channel to achieve binary classification of epileptic and non-epileptic focus channels.A 3-layer fully connected network was employed for the category classifier.We applied the CrossEntropy loss to achieve accurate classification and the MSE (Mean Squared Error) loss to minimize the output differences between source data and style-feature randomization data.The loss functions of the classification network are as follows: The architecture of the proposed pre-trained model.
where L is the CrossEntropy loss function, EB represents the embedding block, SR represents style-feature randomization, G f represents the multi-level temporal-spectral feature extraction network, G c represents the category classifier, y i represents the category label, x i represents input samples, and D s represents a dataset.

Patient discriminator
Since the dataset contains data from multiple patients, we have proposed a method based on the DANN (Yaroslav et al., 2016) to enhance the generalization ability of the model.Features from each patient were extracted according to the marginal distribution by the global adversarial network.The global adversarial loss function is as follows: where L is the CrossEntropy loss function, G f represents the multi-level temporal-spectral feature extraction network, G pd represents the patient discriminator, d i represents the patient label, and represents the patient sample set.

Training details
We proposed an adversarial training strategy to train the loss functions jointly: Where O 1 0 01 ., O 2 0 01 . .ˆ, , f c pd θ θ θ are trained by a special layer called Gradient Reversal Layer (GRL).This GRL is omitted during forward propagation, and the gradient is reversed in backpropagation.Finally, we searched for the optimal parameters ˆ, , f c pd θ θ θ to meet the following requirements: where θ f is the parameters of the multi-level temporal-spectral feature extraction network, θ c represents the parameters of the category classifier, and θ pd represents the parameters of the patient classifier.

Model fine-tuning
The parameters of feature extraction module in the pre-trained model were frozen, and then the category classifier was fine-tuned using the CQMUE dataset.In the CQMUE dataset, only one channel is a seizure channel, causing an imbalance between the positive and negative samples.To address this issue, we have introduced the weighted CrossEntropy loss function (Rezaei-Dastjerdehei et al., 2020):

Result visualization
For the test data, first the output probability of each channel was calculated, then the channel with the highest output probability was selected as the epileptic focus channel, and finally the output probability of each channel was visualized by whole brain topography.The test procedure is shown in Figure 3.

Evaluation
To evaluate the pre-trained models, a 10-fold cross-validation was performed on the Bern-Barcelona dataset and the Bonn dataset.All data were randomly scrambled and divided into 10 parts, one part of which was used for testing and the others for training.
Moreover, testing our model through the leave-one-out crossvalidation method has validated the robustness of our approach on the CQMUE dataset, i.e., data from one person are used for testing and data from another person are used for fine-tuning the model.The training results were averaged to obtain the final test results.

Experimental parameters
The experimental environment included Windows 10 operating system, Python 3.7.4 as the program language, and Pytorch (version 11.1) as the deep learning framework.The graphics card used was: GeForce RTX 3060.
The training epoch was set to 100 times and the batch size was set to 100.The loss function consists of the CrossEntropy loss function, the weighted CrossEntropy loss function and the MSE loss function.The model adopted the Adam optimizer, and the learning rate was set to 0.0005.All parameters were optimized using grid search.

Evaluation metrics
The experiment employed accuracy (ACC), sensitivity (SN), specificity (SP), positive predictive value (PPV) and negative predictive value (NPV) to quantify the performance of the proposed method (Chen et al., 2017).

ACC TP TN TP FN FP FN
(15) where TP, TN, FP, and FN are true positive, true negative, false positive and false negative, respectively.

Experiments and discussions 4.1 Overall comparison
In this section, we computed a number of performance metrics such as accuracy, sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) on the Bern-Barcelona and Bonn datasets respectively, to evaluate our pre-trained model.The classification accuracy of the proposed method was 94.50% when applied to the Bern-Barcelona dataset, while it was 97.50% when applied to the Bonn dataset.The high accuracy was due to the use of convolutional layers, style-feature randomization, squeeze-andextinction network, and domain adversarial.The robustness of our approach has been validated via the 10-fold cross-validation method.
For an additional evaluation of our method, we performed a comparison experiment on the same dataset.Table 2 shows the results

FIGURE 2
Transfer learning model.

FIGURE 3
Test procedure on the CQMUE dataset.Yang et al. 10.3389/fncom.2023.1294770Frontiers in Computational Neuroscience 07 frontiersin.orgon the Bern-Barcelona dataset, and Table 3 shows the results on the Bonn dataset.
To better demonstrate the comparison results between our proposed method and other methods on both datasets, we adopt the radar chart, which can compare the superiority using several different indicators (accuracy, sensitivity, specificity, PPV, NPV).It is clear that our method covers a larger pentagon in both datasets, which shows that our approach outperforms previous work in localizing the epileptic focus.The radar charts are illustrated in Figure 4.
We employed the CQMUE dataset to validate the performance of the pre-trained model.The CQMUE dataset contains EEG data from only six patients, while each patient has multiple epileptic seizures.In the fine-tuning and validation experiment, we use the leave-one-out cross-validation method, i.e., data from five patients are used for model fine-tuning and data from the remaining patient are used for validation.To avoid model overfitting, we applied the method of increasing the number of fine-tuning samples, each EEG sample lasting 20 s and adjacent fragments with 90% overlap.For data from one patient used for validation, we obtained only one sample (20 s) in each epileptic seizure.Experimentally, the average accuracy of epileptic focus localization was 90.15%.The results are shown in Table 4.
To better demonstrate the comparison between our proposed method and related literature (Amirsalar et al., 2019) on the CQMUE dataset, we can find in Table 5 that our method achieved a high accuracy of 90.15%.

Ablation experiments with pre-trained models
To validate the performance of the style-feature randomization module and the DANN in the pre-trained model, we performed ablation experiments on the Bern-Barcelona and Bonn datasets.We tested the performance of removing the style-feature randomization module, removing the DANN, and removing both the style-feature randomization and DANN modules, and compared them with the proposed method.Tables 6, 7 show that the performance of the model decreased after removing the style-feature randomization and DANN modules on the Bern-Barcelona and Bonn datasets, respectively.

Conclusion
In this paper, we have proposed a deep learning model for the localization of epileptic focus.This method includes a pre-training phase and a fine-tuning phase.In the pre-training phase, the model adopted a multi-level temporal-spectral feature extraction model and an attention mechanism to enhance the feature extraction ability, achieving an average focus localization accuracy of 94.5% on the Bern-Barcelona dataset and 97.5% on the Bonn dataset, respectively.When compared with related methods, the experimental results have demonstrated that the pre-trained model outperforms competitive state-of-the-art baselines in accuracy, sensitivity, and negative predictive value.To validate the model's actual performance, we fine-tuned our pre-trained model using the epilepsy dataset from Chongqing Medical University and conducted tests, obtaining an impressive average accuracy of 90.15%.Therefore, the superior performance of the model has demonstrated that the proposed method is highly effective for localizing epileptic focus.Next, we will develop a medical device that incorporates the proposed method to assist physicians' clinical localization diagnosis of epileptic focus.The bold text are to highlighted the results of the methodology presented in this paper.The bold text are to highlighted the results of the methodology presented in this paper.The bold text are to highlighted the results of the methodology presented in this paper.The bold text are to highlighted the results of the methodology presented in this paper.The bold text are to highlighted the results of the methodology presented in this paper.Yang et al. 10.3389/fncom.2023.1294770Frontiers in Computational Neuroscience 09 frontiersin.org the total number of samples in the dataset, N c represents the number of samples for each category within the dataset.M represents the number of categories, p c represents the model output probability, and y c represents the label for each category.In the fine-tuning phase, the CrossEntropy loss function must be replaced with the weighted CrossEntropy loss function.The parameters of the feature extraction module were frozen, while the parameters of the category classifier were trainable.The transfer learning model is shown in Figure2.

FIGURE 4
FIGURE 4Comparison results on (A) Bern-Barcelona dataset and (B) Bonn dataset.

TABLE 1
The information of the CQMUE dataset.

TABLE 2
Results on the Bern-Barcelona dataset.

TABLE 3
Results on the Bonn dataset.The bold text are to highlighted the results of the methodology presented in this paper.

TABLE 4
Test results on the CQMUE dataset.

TABLE 5
Performance comparison between proposed method and related literature.

TABLE 6
Ablation experiments on the Bern-Barcelona dataset.

TABLE 7
Ablation experiments on the Bonn dataset.