Combining data augmentation and deep learning for improved epilepsy detection

Introduction In recent years, the use of EEG signals for seizure detection has gained widespread academic attention. Aiming at the problem of overfitting deep learning models due to the small number of EEG signal data during epilepsy detection, this paper proposes an epilepsy detection method that combines data augmentation and deep learning. Methods First, the Adversarial and Mixup Data Augmentation (AMDA) method is used to realize the data augmentation, which effectively enriches the number of training samples. To further improve the classification accuracy and robustness of epilepsy detection, this paper proposes a one-dimensional convolutional neural network and gated recurrent unit (AM-1D CNN-GRU) network model based on attention mechanism for epilepsy detection. Results and discussion The experimental results show that the performance of epilepsy detection achieved by using augmented data is significantly improved, and the accuracy, sensitivity, and area under the subject’s working characteristic curve are up to 96.06, 95.48%, and 0.9637, respectively. Compared with the non-augmented data, all indicators are increased by more than 6.2%. Meanwhile, the detection performance was significantly improved compared with other epilepsy detection methods. The results of this research can provide a reference for the clinical application of epilepsy detection.


Introduction
Epilepsy is the second largest neurological disorder in the world, a chronic disease caused by sudden abnormal discharge of brain neurons leading to transient brain dysfunction (1).There are approximately 50 million patients worldwide, and patients may experience injuries or even life-threatening emergencies during seizures, causing great psychological stress and work-life difficulties (2).In clinical practice, electroencephalograph (EEG) is one of the most common and important examination tools used in the diagnosis and treatment of epilepsy (3).As a non-invasive EEG activity detection tool, it records a large amount of EEG information, which is beneficial for doctors to identify the lesion and carry out effective treatment.However, due to the uncertainty of epileptic seizures, which cannot be observed in a short period, EEG signals need to be observed and captured over a long period, which is an inefficient and time-consuming process.Therefore, the automatic detection of epileptic EEG helps reduce the workload of medical workers, and it becomes particularly important to further promote the research on the automatic detection of epileptic EEG (4).The use of EEG for epileptic detection has received extensive academic attention over the past decades, and the research methods mainly include machine learning and deep learning.Among them, machine learning requires artificial design algorithms to obtain EEG signal features, and then combine with related classifiers to realize automatic recognition of epilepsy signals (5).The methods for extracting signal features usually include three types: time-domain, frequencydomain, and time-frequency domain.For example, using wavelet transform, morphological analysis, and other methods to extract sample features.Then using methods such as Random Forest (RF) (6), Naive Bayes (NB) (7), Linear Discriminant Analysis (LDA) (8), and Support Vector Machine (SVM) (9) to classify the samples.Raghu (13) proposed two multiscale entropy analysis methods, multiscale dispersion entropy (MDE) and refined composite multiscale dispersion entropy (RCMDE), which were used in combination with SVM for seizure detection with good results.
Deep learning involves feeding a large amount of data into a deep neural network for training and then using the trained neural network model to classify and predict the new data (14) Although existing deep learning-based methods for automatic epilepsy detection have made great progress, there are still two problems to be solved.One is that the correlation between EEG feature vectors is not taken into account and there is a lack of reconstructive enhancement of the features.The other is due to imbalanced data classification and insufficient sample size, resulting in poor recognition ability and unstable performance of the model.Based on the above two problems, this paper firstly adopts Adversarial and Mixup Data Augmentation (AMDA) method to train the model on the augmented training data DAMDA.Where DAMDA is obtained by adversarial training and mixup data augmentation.The augmented EEG data is then fed into the AM-1D CNN-GRU model for classification, which utilizes 1D CNN for high-dimensional feature extraction.The attention mechanism is introduced to enhance the correlation between the extracted features.Finally, GRU is utilized to fuse the information of the front and back sequences to fully integrate the information of the adjacent EEG signals and improve the accuracy of the model detection.

Adversarial and mixup data augmentation
To better generalize the deep learning model and improve the robustness of the model to adversarial samples and noise from damaging EEG signals, this paper uses AMDA method to train the model adversarial.Christian Szegedy et al. (20) proposed the concept of admissible samples, that is, the input samples formed by deliberately adding subtle disturbances to the data set, and the disturbed inputs lead to the model giving an incorrect output with high confidence.By applying the Adversarial and Mixup Data Augmentation (AMDA) method, it effectively promotes the generalization ability of the deep learning model and significantly improves the robustness of the model in the face of adversarial samples as well as destructive noise.The AMDA method used in this paper is accomplished through the following steps, as shown in Figure 1.
Firstly, the network classifier is trained using raw EEG data, and then perturbations are added to the raw data.Input the generated adversarial samples into the classifier for classification, and obtain the classification error.Mixing the adversarial and original samples as the training set for the classifier to train.In the AMDA method used in this paper, the idea of generating perturbations is to compute the loss function J x y θ , , ( ) of the deep neural network model, and the perturbations to be added by maximizing the loss function, where x and y are the original data and the corresponding real labels, and θ is a parameter of the model.The goal is to generate an adversarial sample x that is not easily distinguishable from x by maximizing J x y θ , ,

(
) to deceive the model, i.e., to misclassify using the adversarial sample x.Therefore, adversarial sample generation can be transformed into an optimization problem with the following constraints, as shown in Equation ( 1):

Am-1D CNN-GRU network architecture
The AM-1D CNN-GRU network proposed in this paper is divided into five parts, input layer, convolutional layer, attention layer, loop layer, and output layer.The specific structure is shown in Figure 2. The model does not require complex manual extraction of features, and compared with other models, the model designed in this paper has only 5 layers of network, with a small number of parameters, a simple model, and a small amount of computation, which can be well ported to mobile devices for practical applications in the later stage.
Firstly, the original EEG signal slices are input into the 1D CNN network for feature extraction, and the attention mechanism is utilized to assign weights to the feature vectors to highlight the important features.Then it is inputted into GRU network to fuse the information of the front and back sequences to fully integrate the information of the neighboring EEG signals.Finally, it is input to the full connectivity layer for classification.

1D CNN models
CNN is one of the representative algorithms for deep learning.Artificial neurons can respond to surrounding units and perform large-scale data processing, which maps the original inputs to new features through data transformation as well as dimensionality reduction (21).Unlike standard fully connected networks, CNN has a special network structure.It includes a feature extractor consisting of a convolutional layer and a pooling layer, which makes the network model simple by using local connectivity and weight sharing to extract features from the original data.Thus, it speeds up the training and improves the generalization performance (22).Because the EEG signal is a one-dimensional time series, 1D CNN is chosen in this paper.Its internal is shown in Figure 3.
In terms of network structure, 1D-CNN is the same as CNN, which contains 1D convolutional layer, 1D pooling layer, and fully connected layer, etc.Its main part usually consists of multiple 1D convolutional layers cascaded alternately with 1D pooling layer, which performs feature extraction on the input 1D data through multiple convolutional and pooling operations, and then classifies the input data through fully connected layer.

Attention mechanism
The Attention mechanism is a mechanism that mimics the allocation of attention in the human brain.It is able to focus attention on important areas at a given moment and ignore or diminish attention to other areas.Thus, more detailed information is obtained and useless information is filtered out.Its core idea is to flexibly and reasonably adjust the attention to information, amplify the needed information, and suppress irrelevant information (23).Attention mechanism gives higher weight to key information through the method of probability allocation, highlighting the impact of important information, so as to improve the accuracy of the model.The structure of the Attention mechanism is shown in Figure 4.
The EEG signal is reconstructed for each feature vector by considering the connectivity between the individual feature vectors as it passes through the Attention layer, which processes the localized feature vectors of the input.The specific feature reconstruction process is shown in Figure 5.
Each local feature vector of the input creates three corresponding vectors, which are Query vector, Key vector and Value vector.These 3 vectors are extracted by the feature extraction module for each feature.These three vectors are each feature vector extracted by the feature extraction module and the three weight matrices W Q , W K , and W V .

GRU model
GRU is a variant of LSTM network, which has a simpler structure and can solve the problem of gradient explosion and disappearance of RNN in practical applications, and is widely used in the prediction of   time series.GRU has a similar data flow with LSTM, but GRU lacks a separate storage unit, which makes it more efficient in the training process (24).There are only two gates in the GRU model: the update gate and the reset gate, and its specific structure is shown in Figure 6.
Where × is the number product of the matrix, σ is the Sigmoid function, tan h is the activation function, and 1 − denotes that the data propagated forward by this link is 1 − z t .The update and reset gates are z t and r t respectively, x t is the input and h t is the output of the hidden layer.Internal structure of 1D CNN.
Attention structure.The CHB-MIT dataset consists of scalp EEG recordings of recalcitrant seizures in pediatric subjects with refractory epilepsy.All EEG data were collected using the international standard 10-20 EEG electrode position system, with a sampling frequency of 256 Hz.This dataset contains EEG records of 24 patients with epilepsy, with patient numbers 1 and 21 being the same patient and EEG data collection interval of one and a half years.Number 24 is supplementary data without detailed patient information.The entire database totaled up to 967.85 h of continuous scalp EEG recordings and 178 seizure events.Each case contained 18 or 23 multilead EEG recordings ranging from 9 to 42 (25).Due to the inconsistency of sampling channels for each subject, 13 patients with consistent channels were selected for experimental data in this paper.Details of the Boston Children's Hospital scalp epilepsy EEG dataset are shown in Table 1.

Experimental process
The experimental process in this paper mainly includes three parts: data preprocessing, data augmentation, and feature extraction and classification.The whole experimental flow is shown in Figure 7.
In the data preprocessing stage, most of the data sampling time is 1 h, too long data is not conducive to the network learning data feature information.Therefore, the scalp EEG signal is segmented into small equal length segments of data, and a segmentation time of 30 s was chosen for this experiment to ensure the smoothness of the sample.At the same time, the computation of the model can be largely reduced to ensure the real-time performance of the system.In the data augmentation stage, the preprocessed data were processed with no processing and AMDA algorithm in order to verify how much generated data can be added to the original dataset to optimize the performance of the classifiers, the adversarial samples of n times the original training set (n × dataset) were added to each patient using AMDA respectively, where n is 1, 2, and 3, and the data processed by the four methods were inputted into the AM-1D CNN-GRU model, respectively, to carry out the feature extraction and classification, and the classification performances of the four methods were compared.70% of the dataset is used as a training set after the data augmentation process, and the other 30% is used as a test set to test the classification ability of the model.

Evaluation indicators
In this paper, we use accuracy, sensitivity specificity, and AUC as performance indicators, and each evaluation index is defined as follows.
(1) Accuracy is defined as the proportion of correctly categorized samples to all samples and is calculated as shown in Equation ( 2):

ACC TP TN TP TN FP FN
(2) Sensitivity is defined as the proportion of correctly categorized positive samples to all positive samples, i.e., the proportion correctly judged to be ill, and is calculated as shown in Equation ( 3): (3) AUC is defined as the area under the subject's work characteristic (ROC) curve, which indicates the probability that a pair of positive and negative samples are randomly selected and the model scores the positive samples greater than the negative samples.AUC ranges from 0.5 to 1 and is a performance indicator of the effectiveness of the classifier.The formula is as shown in Equations ( 4)-( 6): Feature reconstruction map.
Where TP is a positive case that was correctly diagnosed as a positive case, TN is a negative case that was correctly diagnosed as a negative case, FP is a negative case that was incorrectly diagnosed as a positive case, FN is a positive case that was incorrectly diagnosed as a negative case.FPR (False Positive Rate) the horizontal coordinate of the ROC curve, TPR (True Positive Rate) is the vertical coordinate, and, denote the successive coordinates on the ROC curve, respectively.

Experimental parameters
The hyperparameter settings for model training have a great impact on the prediction effect of the model.In order to make the CNN fully extract the features of the dataset, the size of the convolutional kernel of the 1D convolutional layer is set to 3 × 1, maximum pooling is chosen, and the number of GRU neurons is set to 128.Since the GRU cannot directly output the RUL, the fullyconnected layer has to be added to predict the RUL, and a Dropout layer is added to the network to prevent overfitting, and the size of Dropout is set to 0.5.Finally, the learning rate is chosen to be 0.001, and the optimizer is chosen to be Adam and padding = same, and the training rounds are 50 times.Finally, the learning rate size is chosen as 0.001, the optimizer is chosen as Adam and padding = same is chosen for padding, the number of training rounds is 50 and the Batch Size is set to 256.

Experimental results
Four different methods were used to process the training set and the classifier was tested on each case.The results are shown in Table 2.
As can be seen from Table 2, compared with the classification results of unprocessed data, AMDA algorithm has significantly improved the classification results.The reason is that the nature and characteristics of AMDA are used to apply adversarial samples to data augmentation, which improves the amount of training data.The reason is that the use of the properties and characteristics of AMDA and the application of confrontation samples in data augmentation have improved the amount of training data.The AMDA algorithm is used to amplify the original data set by 1, 2, and 3 times, and the results show that when the amplified sample is 1 × dataset, the classification effect is slightly better than 2 × dataset and 3 × dataset.When more adversarial samples were continually added, the model's classification effectiveness decreased, suggesting that as the number of adversarial samples increased, so did the number of features that were irrelevant to epilepsy detection, and the classification effectiveness was limited as a result.

Discussion
The model proposed in this paper consists of three parts: 1D CNN, Attention mechanism, and GRU.Although the above three parts have great advantages in their respective fields, however, in the seizure detection work, only one or two of them cannot fully highlight their advantages, and only a clever combination of the above three Basic unit of GRU network.8.
In the selection of comparison literature, the literature (26-36) take CHB-MIT dataset as the test and validation object.Therefore, the method of this paper will be compared with these literatures to verify the superiority of the method of paper, and the experimental comparison results are shown in Table 3.In the comparison experiment, the method proposed in this paper is analyzed with the experimental results when the amplified sample is 1 × dataset in comparison with the comparison literature.
Previous studies have shown that deep learning has better performance than traditional methods for automatic seizure detection.In this chapter, RF and SVM, which are the better performing traditional machine learning classifiers, were chosen for comparison.The experimental results in Table 3 also verify this conclusion.At the same time, this chapter also compares the proposed method with some deep learning based classifiers.Literature (32) proposes an automatic epilepsy detection classification model based on convolution and attention mechanisms, and the experimental results of this model are the closest to the method proposed in this paper.An 18-layer long-term recursive convolutional network model was constructed in the literature (34) and achieved results with a sensitivity of 84% and an accuracy of 99%, which is higher than the final results of this paper, but the sensitivity is lower than the 95.48% of this paper, which is usually a more critical evaluation metric for epileptic seizure detection.Analyzing the superiority of the proposed method in this paper, there are three aspects: (1) AMDA is used for data augmentation, which increases the availability and effectiveness of the dataset.(2) Give higher weight to key information through Attention mechanism to highlight the impact of important information.(3) Using GRU to fuse the information of the front and back sequences fully integrates the information of the neighboring EEG signals and improves the performance of the model detection.
In order to test the robustness of the model against noisy or artifact-laden EEG data, epilepsy detection was implemented in non-denoised signals.The model proposed in this paper achieves an experimental result of accuracy 87.65%, sensitivity 88.72%, and AUC 0.8859.It can be seen that the performance obtained for the denoising case is higher than that obtained for the non-denoising case.Although the non-denoised EEG signals will retain more information, the noise contained in them will affect the judgment of the classifier, which will result in misjudgment and affect the performance of the epilepsy detection model.
The model proposed in this paper can be transplanted to the epilepsy detection clinical diagnosis and treatment equipment to help doctors understand more about the patient's physical condition, so as to formulate the treatment plan in a targeted way, avoiding some unnecessary physical examinations that may bring harm to the body, which the doctor makes the patient do in order to determine the cause of the disease.At the same time, some of the early symptoms of epilepsy patients are relatively hidden, resulting in many patients ignoring them.The model proposed in this paper can assist doctors in making accurate diagnosis of epilepsy, so that patients can take preventive measures and reduce the pain caused by the onset of epilepsy.

Conclusion
In this paper, epilepsy detection is realized using EEG signals and data augmentation of epileptic EEG data using AMDA under Experimental flow chart.Comparison of experimental results.
Acharya et al. (15) first used Convolutional Neural Network (CNN) for seizure detection.The property of automatic feature extraction by Convolutional Neural Network significantly improved the performance of seizure detection compared to the traditional manual extraction of features (16).Aliyu et al. (17) proposed a Long Short Term Memory (LSTM) network for classifying epileptic EEG signals.Li et al. (18) proposed a patient-specific seizure prediction method based on Deep Residual Shrinkage Network (DRSN) and Gate Recurrent Unit (GRU).By introducing GRU into DRSN, they simulated the time dependence of signals in different time windows before the seizure.Hussain et al. (19) proposed an autonomous generalized retrospective and patient-specific hybrid model based on CNN and LSTM feature extractors.

FIGURE 8
FIGURE 8 et al. (10) used a combination of SVM and Continuous Decomposition Index for epilepsy detection.Omidvar et al. (11) used Discrete Wavelet Transform combined with Artificial Neural Networks with SVM for seizure detection.Ravi Kumar et al. (12) used variational mode decomposition combined with RF for automatic identification of EEG signals in epileptogenic regions.Sukriti et al.
about EEG signals between multiple channels.At the same time, the dataset has a very small amount of data and cannot well simulate the EEG recordings of real-life epileptic patients.Therefore, the CHB-MIT dataset is chosen to test the performance of the proposed model in this paper. information

TABLE 1
Dataset information.

TABLE 2
Classification results of different data enhancement methods.condition of limited number of training samples.The AM-1D CNN-GRU model is used for classification and the performance of the method is tested on the CHB-MIT dataset.Combining the two methods yielded more ideal results, with accuracy, sensitivity, and area under the subject's work characteristic curve reaching 96.06, 95.48%, and 0.9637, respectively, when the amplified sample was 1 × dataset.Comparison with related literature shows superiority over many existing classification methods.The method proposed in this paper is not only important for detecting epileptic seizures, but also provides a very effective method for classifying time series in other fields.At this stage, this experiment is tested on a public dataset, and in future work, the validity of the proposed method will be further validated on more clinical EEG datasets to improve the model detection performance and to explore the available high-precision, high-adaptive, and high-reliable algorithms for the clinical application of epilepsy detection. the

TABLE 3
Classification performance of different methods on the CHB-MIT EEG dataset.