Sparse Granger Causality Analysis Model Based on Sensors Correlation for Emotion Recognition Classification in Electroencephalography

In recent years, affective computing based on electroencephalogram (EEG) data has attracted increased attention. As a classic EEG feature extraction model, Granger causality analysis has been widely used in emotion classification models, which construct a brain network by calculating the causal relationships between EEG sensors and select the key EEG features. Traditional EEG Granger causality analysis uses the L2 norm to extract features from the data, and so the results are susceptible to EEG artifacts. Recently, several researchers have proposed Granger causality analysis models based on the least absolute shrinkage and selection operator (LASSO) and the L1/2 norm to solve this problem. However, the conventional sparse Granger causality analysis model assumes that the connections between each sensor have the same prior probability. This paper shows that if the correlation between the EEG data from each sensor can be added to the Granger causality network as prior knowledge, the EEG feature selection ability and emotional classification ability of the sparse Granger causality model can be enhanced. Based on this idea, we propose a new emotional computing model, named the sparse Granger causality analysis model based on sensor correlation (SC-SGA). SC-SGA integrates the correlation between sensors as prior knowledge into the Granger causality analysis based on the L1/2 norm framework for feature extraction, and uses L2 norm logistic regression as the emotional classification algorithm. We report the results of experiments using two real EEG emotion datasets. These results demonstrate that the emotion classification accuracy of the SC-SGA model is better than that of existing models by 2.46–21.81%.


INTRODUCTION
Emotions are an important part of decision cognition and interpersonal interaction (Oatley et al., 2006;Izard, 2013), and research in many fields is attempting to recognize human emotions through computer systems, such as emotional computing, neurology, and psychology (Catanzarite and Greenburg, 1979;Picard, 1999). In the field of human interaction, in particular, emotional computing would enable machines to perceive the emotional state of the human brain, allowing them to learn more about people through human-computer interaction (Cauchard et al., 2016;Zhou, 2018). At present, research methods for studying emotion recognition are mainly divided into two categories: the first category is based on non-physiological signals, such as speech, body posture, and facial expression; the second category is based on physiological signals, such as electrocardiogram and electroencephalogram (EEG) data (Picard, 2000(Picard, , 2003Tao and Tan, 2005). EEG signals are obtained directly from the cerebral cortex, and thus directly reflect changes in human emotions (Larsen, 2011;Dan et al., 2013). Therefore, in recent years, EEG emotion recognition technology has become increasingly popular (Bos et al., 2006;Lin et al., 2010;Atkinson and Campos, 2016;Song et al., 2018).
Researchers have proposed many advanced EEG analysis methods, such as identifying subtypes of mental disorders from the functional connection patterns of resting state EEG data; improving EEG decoding through cluster-based multitasking feature learning; and early Alzheimer's diagnosis based on resting state EEG topological network analysis (Moore and DeNero, 2011;Wang et al., 2011;Liu et al., 2012;Zhang et al., 2012;Zhou et al., 2012;Zhu et al., 2014;Suk et al., 2015). Among them, feature extraction and sensor causality analysis are hot topics of research. Granger causality analysis is an important feature extraction method based on sentiment calculation, and has been widely used by researchers (Dongwei et al., 2013;Immordino-Yang and Singh, 2013;Zhang et al., 2017). For example, Zhang et al. used the Granger causality analysis model to construct an effective brain connection network on Database for Emotion Analysis Using Physiological Signals (DEAP) emotional EEG data to study how emotion affects the pattern of effective connection (Zhang et al., 2017); Coito et al. used the Granger causality model to study whether the EEG phase of patients with left temporal lobe epilepsy and right temporal lobe epilepsy exhibited changes in directional functional connectivity (Coito et al., 2016). However, clinical and neuroscience applications will inevitably produce outliers or artifacts when collecting data (Blankertz et al., 2007). These can cause the quality of EEG signals to deteriorate and produce problems with noise. In particular, EEG signals are often contaminated by abnormal values when blinking or head movements form a trajectory. The original Granger causality analysis uses the L 2 norm loss function, the squared nature of which tends to exaggerate outliers, and retains all of the data. This can lead to erroneous analysis results (Xu et al., 2007(Xu et al., , 2010aLi et al., 2015;Bore et al., 2018Bore et al., , 2019. Therefore, due to the sparse connectivity of the brain network, researchers proposed Granger causality analysis models based on the least absolute shrinkage and selection operator (LASSO) to solve the noise problem (Valdés-Sosa et al., 2005;Marinazzo et al., 2008;Shaw and Routray, 2018). However, the L 1/2 regularizer is more sparse and robust than LASSO (Xu et al., 2010b;Zong-Ben et al., 2012;Li et al., 2017). Thus, Granger causality analysis based on the L 1/2 norm has been developed, and experiments have proved that this obtains better solutions (Bore et al., 2020).
The purpose of the existing sparse Granger causality analysis model based on LASSO or L 1/2 regularization is to establish a sparse brain network relationship matrix, retain the data between EEG sensors with high causality, and remove data with weak causality. Hence, effectively calculating the causality weights between EEG sensors has become a key issue in sparse Granger causality analysis. The existing sparse Granger model uses the multivariate autoregressive (MVAR) model to establish the weight matrix of the EEG sensor causality relationship (Geweke, 1982;Seth, 2010;Hu et al., 2015). MVAR reflects the direct causality relationship between each sensor. This method assumes that each EEG sensor has the same prior knowledge (that is, the correlation between the various sensors is consistent). However, based on known EEG data, researchers can use statistical methods to pre-calculate the correlation between each EEG channel. We believe that if the correlation between EEG channels could be integrated into the sparse Granger model as prior knowledge, the causality relationship between the various sensors in the existing sparse Granger causality model would be enhanced, thereby improving the feature selection ability of the model. Based on this idea, the present paper proposes a Granger causality network model based on sparse sensor correlation, and combines a sparse logistic regression classification algorithm based on L 2 regularization. This sparse Granger causality analysis model based on sensor correlation (SC-SGA) uses the Pearson similarity coefficient to calculate the degree of similarity between sensors. SC-SGA integrates this similarity degree as a weight into a sparse Granger causality model based on the L 1/2 regularizer for feature extraction, and finally uses a sparse logistic regression algorithm based on L 2 regularization for emotion recognition, as shown in Figure 1.
In this study, experiments were conducted on two real datasets. The experimental results show that, compared with the existing models, the SC-SGA model achieves better recognition of different emotions. We believe that the SC-SGA model is a good complement to the classification model based on sparse Granger causality analysis, and that the method and results presented in this article will be very useful in future research.

Materials
Sixteen channels were selected for experiments related to emotional states. The channel selection is shown in Figure 2.

SEED Dataset
The SJTU(Shanghai Jiao Tong University) Emotion EEG Dataset (SEED) is a collection of EEG datasets provided by the BCMI(Brain-like Computing & Machine Inteligence) laboratory FIGURE 1 | Experimental process of the model proposed in the paper. (Duan et al., 2013). SEED uses film fragments as emotioninducing materials and includes three categories of emotion: positive, neutral, and negative. The details of the film clips used in the experiments are listed in Table 1. A total of 15 subjects (seven males, eight females, mean age 23.27 years, standard deviation 2.37 years) participated in the SEED experiments, all of whom had normal visual, auditory, and emotional states. In the experiments, 15 movie clips were played. These movie clips were all from Chinese movies. The 15 movie clips were of three types, with five clips of each type. Each clip was played for about 4 min. In each experiment, movie clips of different emotional states were watched by the participants. As the subject was watching the movie, EEG signals were recorded through an electrode cap at a sampling frequency of 1,000 Hz. The experiments used the international 10-20 system and a 62-channel electrode cap. Each volunteer participated in three experiments, and each experiment was separated by about 1 week. Therefore, after screening, a total of 660 data samples had been obtained. To obtain a preprocessed EEG dataset, 200 Hz down-sampling and a bandpass frequency filter from 0 to 75 Hz were applied. Each dimension of SEED is described in Table 2. For more information on this dataset, please refer to the website http://bcmi.sjtu.edu.cn/∼seed/index.html. a) Gamma band dataset: The SEED EEG dataset contains five EEG bands. The main frequency range of the five bands is 1-50 Hz. The frequency range of gamma brain waves is 31-50 Hz. Previous studies have shown that the gamma band generally occurs in pathological conditions, such as epilepsy, or under external stimuli. Additionally, it is often used for multimodal analysis in experiments. Therefore, we use the gamma brain waves for experimental analysis. The gamma brain wave frequency band of the SEED dataset contains 660 samples. b) Combined band dataset: To verify the performance of our model, we also examine the use of all frequency bands of the EEG dataset. The EEG signals were decomposed into five (1) FIGURE 2 | Sixteen channels used in the experiments. frequency bands according to the EEG rhythm, comprising delta (1-3 Hz), theta (4-7 Hz), alpha (8-13 Hz), beta (14-30 Hz), and gamma (31-50 Hz) bands. These five frequency band signals were combined to form a new combined frequency band dataset. Therefore, two EEG datasets representing different frequency bands were obtained. Finally, four feature processing models were tested and verified using the above datasets: Original (dataset not processed), LASSO, least absolute L p (0<p<1) penalized solution (LAPPS), and SC-SGA. In the experiments, the 660 samples were randomly assigned to a mutually exclusive training set (80%) and a verification set (20%).

DEAP Dataset
The DEAP dataset (Koelstra et al., 2011) can be found at http://www.eecs.qmul.ac.uk/mmv/datasets/deap/. It includes 32-channel EEG signals and peripheral physiological signals such as GSR(galvanic skin response) signals, EOG(electrooculogram) signals, EMG(electromyography) signals, PPG(photoplethysmograph) signals, temperature, and status. All data have been down-sampled to 128 Hz, whereby the EEG signal data became a 60 s test signal and a 3 s baseline. A zero-phase bandpass filter of 4-45 Hz was applied. In this study, the 32-channel EEG data were divided into two classes according to their arousal status: positive (more than 6) and negative (less than 4).
The DEAP dataset consists of two parts. The first part contains the ratings from an online self-assessment in which 120 1-min extracts of music videos were rated by 14-16 volunteers based on arousal, valence, and dominance. The second part includes the participant ratings, physiological recordings, and facial videos from an experiment in which 32 volunteers watched a subset of 40 of the above music videos. The EEG and physiological signals were recorded and each participant rated the videos as above. For 22 participants, frontal face videos were also recorded. At the end of each video, the participants were required to fill out a self-assessment form of their arousal, ranging from inactive (1) to active (9), their valence, ranging from unpleasant (1) to pleasant (9), and their dominance feelings, ranging from helpless and weak (1) to empowered (9). Figure 3 shows the two-dimensional emotional model of the DEAP dataset. Each dimension of DEAP is described in Table 3. In experiments with the DEAP dataset, we only used data from combined frequency bands.

Cross-Validation
To ensure the accuracy of the results, a 5-fold cross-validation method was used in all the experiments. Five-fold crossvalidation first divides all the data into five sub-samples. One of the sub-samples is selected as the test set, and the other four samples are used for training. This process is repeated five times, and the average and its error range are calculated. In addition to 5-fold cross-validation, all experiments described in this paper were performed 100 times, allowing the average and error statistics to be obtained.

Methods
The sparsity of connections in brain networks has been proved by researchers (Genç et al., 2018). Many unnecessary connections will occur when researchers construct causality brain networks. If these connections are directly involved in the analysis and calculation, there will be an increased computational complexity and an enhanced likelihood of overfitting. Therefore, when constructing causality brain networks, sparse regularizers such as the L 1 and L 2 norm can be used. Adding sparse regularizers effectively extracts the important features of the network and reduces the time complexity of the network. In this way, the goal of improving accuracy while reducing the operational requirements can be achieved (Bore et al., 2018(Bore et al., , 2020.

L 2 Granger Analysis
Granger analysis is based on an MVAR model. This form of analysis allows researchers to estimate the relationship between multiple sets of time series data. Therefore, the accuracy with which the MVAR parameters are calculated determines the reliability of the final relationship, which ultimately affects the accuracy of the Granger analysis correlation network. There are multiple strategies for estimating the parameters of MVAR models. If we assume there are m stationary stochastic processes with W i (t) ∈ R time domain observations such that i = 1, 2, . . . , m; t − 1, 2, . . . , T, we obtain Equation (1), where s is the maximum number of lagged observations that are added to the model and a ij (i = 1, 2 . . . , m; j = 1, 2, . . . , m) is the vector of coefficients that defines the effect of the activity of W i (t) on W j (t). Moreover, k (k = 1, 2, . . . , m) is the variance of residuals between the expected W k and the predictedŴ k in the corresponding processes. Suppose that: are the multivariate autoregressive coefficients, with m being the number of time series and y k = [W k (s+1), W k (s+2), . . . , W k (n)] being the n − s elements to be predicted for W k , where n denotes the length of the signal. Now, we define the design matrix A ∈ R (n−s)×(m×s) as: In this case: Consequently, we find the solution for Equation (1) with the objective term defined in the L 2 norm space (L 2 norm loss function) as: Frontiers in Computational Neuroscience | www.frontiersin.org Here, · 2 denotes the L 2 norm of a vector and "argmin" indicates that the best solution minimizes the objective function f k (X k ). By taking the derivative of Equation (5) with respect to X k under the condition (df k )/(dX k ) = 0, we obtain the following formulation: The MVAR coefficients for process W k are given by: where (A T A) −1 is the inverse operation of A T A and (A T A) + indicates the pseudo-inverse of A T A (Watkins, 2004).

LASSO Granger Analysis
Because the neurons in the brain are sparsely connected, retaining all the information between the sensors may cause erroneous analysis results due to noise. Therefore, Granger analysis based on LASSO has been developed. LASSO uses the L 1 norm, and adding the L 1 norm to Granger analysis can reduce some coefficients to zero, thus obtaining sparse results. Based on Equation (5), we can write: where λ ≥ 0 is a regularization parameter. This formula is a classic convex optimization problem that can be solved using a greedy algorithm.

LAPPS Granger Analysis
Recently, researchers have discovered that the L 1/2 norm is a more sparse and robust regularizer than the L 1 norm. Therefore, a Granger analysis model based on the L 1/2 norm has been proposed. The model estimates the MVAR parameters using LAPPS. The model is theoretically sparser than that given by LASSO. The ability to eliminate noise and artifacts is also stronger. The Granger analysis model based on the L 1/2 norm can be written as: where the fitting error is measured in the L 1 norm space and L P (p = 1 2 ) norm regularization is imposed on the coefficients, while η > 0 is the regularization parameter. The alternating direction method of multipliers (ADMM) framework can be used to solve this problem.

Proposed LAPPS Granger Analysis Based on Sensor Correlation
In the EEG emotion recognition model, the key factor in improving the final experimental result is feature extraction. Finding EEG data that are related to emotion is the core problem of feature extraction. However, in the previous sparse Granger analysis model, each sensor has the same prior knowledge. This means that the final feature extraction result is only related to the value of the EEG signal, and does not necessarily correspond to the emotional state. If we can quantify the correlation between each EEG sensor and emotion , and use this as a weight in the sparse Granger analysis model, the model's feature selection ability would be improved, further improving the model's classification ability. Under this idea, based on existing research, we propose a sparse Granger analysis model based on sensor correlation and the L 1/2 norm. The model can be written as follows: where L P (p = 1 2 ) norm regularization is imposed on the coefficients. E represents the sensor correlation, which can also be approximated as the weight of emotion. We hope to retain as much relationship information related to emotion as possible. The formula for calculating E is as follows: where T is the number of time series. In this case: where i and j represent the number of sensors. For M i , we have: where Cov(M i , M j ) represents the covariance of the i-th and j-th sensors, Var[M i ] represents the variance of the i-th sensor, and Var[M j ] represents the variance of the j-th sensor.

Logistic Regression Model
In this study, logistic regression is used as the classification model. The probability formula of logistic regression is as follows Wright (1995):

L 2 Sparse Regularizer
Considering the high-dimensional characteristics of EEG data, a logistic regression model based on the L 2 regularizer is used. The formula for the L 2 regularizer is as follows Cortes et al. (2012):

Support Vector Machine Model
As well as the logistic regression model based on sparsity, a support vector machine (SVM) model is used for classification and comparison. The SVM model is a two-classification technique. Its basic model is a linear classifier of the largest interval defined in the feature space, which is the most amenable to the perceptual machine (Adnan et al., 2020;Li et al., 2020;Wang and Chen, 2020). The SVM model also includes kernel techniques, which makes it an effective nonlinear classifier. The learning strategy for the SVM involves maximizing the interval and formalizing a convex quadratic programming problem, which is equivalent to the problem of minimizing the regular closed loss function. The learning algorithm of the SVM model is the optimization algorithm for solving convex quadratic programming problems (Scholkopf and Smola, 2018).

EXPERIMENTAL RESULTS
A series of experiments were conducted using two real datasets and four sparse Granger causal models, namely the Original-Granger causal model, LASSO-Granger causal model, LAPPS model, and the proposed SC-SGA model. The classifier for each model was built using the SVM method, logistic regression method, and ridge regression method. Confusion matrices are used to compare the results between the various models. These matrices summarize the prediction results of classification models in machine learning. The records in the dataset are summarized in matrix form according to the real category and the classification criteria predicted by the classification model. The rows of the matrix represent the true values, and the columns represent the predicted values. The computational accuracy of the proposed model is used as a measure of quality, where the accuracy is defined as the ratio of the number of samples correctly classified by the classifier to the total number of samples in the test dataset. However, accuracy is not always an effective metric for performance evaluation, especially if the numbers of samples with different labels are not exactly equal. Therefore, we also analyze the precision and recall for further comparison of the three two-classifier models. Here, precision refers to the proportion of all predicted true positives in positive classes, and recall refers to the proportion of positives found in all positive classes. All experiments used 5-fold cross-validation to ensure the stability of the proposed model.

SEED Dataset
As shown in Table 4 and Supplementary Table I, the experimental results using the gamma band show that the SC-SGA model proposed in this paper has obvious advantages over the other models. In terms of neutral emotion, the experimental results using the SVM method give a precision of 84.70% for the proposed model, which is 22.70, 10.23, and 2.48% higher than the Original, LASSO-GA, and LAPPS models, respectively. The recall of our proposed model is 87.03%, which is 13.22, 9.25, and 0.98% higher than that of the Original, LASSO-GA, and LAPPS models, respectively. In the experimental results using the logistic regression method, the precision of our proposed model is 88.89%, which is 16.80 13.10 and 3.18% higher than with the Original, LASSO-GA, and LAPPS models, respectively. The recall of our proposed model is 79.12%, which is 5.31, 3.36, and 0.86% higher than that of the Original, LASSO-GA, and LAPPS models, respectively. In the experimental results using the ridge regression method, the precision of our proposed model is 85.99%, which is 15.78, 9.39, and 0.62% higher than that of the Original, LASSO-GA, and LAPPS models, respectively. The recall of our proposed model is 84.70%, which is 12.96, 4.70, and 1.37% higher than with the Original, LASSO-GA, and LAPPS models, respectively. In terms of negative emotion, we obtain a similar conclusion. In the experimental results using the ridge regression method, the precision of the proposed model is 89.79%, some 21.50, 3.30, FIGURE 4 | Box plot obtained using ridge regression with the four models for the combined band of the SEED dataset. Score represents the accuracy rate. and 1.99% higher than the Original, LASSO-GA, and LAPPS models, respectively. The recall of SC-SGA is 83.60%, which is 24.03, 7.41, and 3.60% higher than that of the Original, LASSO-GA, and LAPPS models, respectively. In classifying positive emotion, the proposed method achieves the best experimental results. Using ridge regression, the precision of our proposed model is 88.21%, which is 29.12, 4.88, and 4.21% higher than the Original, LASSO-GA, and LAPPS models, respectively. The recall of our proposed model is 89.75%, which is 23.08, 4.86, and 2.42% higher than that of the Original, LASSO-GA, and LAPPS models, respectively. With the ridge regression method, the precision of our proposed model is 86.49%, which is 20.58, 4.67, and 1.88% higher than the Original, LASSO-GA, and LAPPS models, respectively. (See the Supplementary Materials for the L 1 and SVM results.) We also experimented with the combined band of the SEED dataset. The three classification methods were used with the four processing models, and the results are consistent with those for the gamma band. The following analysis considers the results obtained by ridge regression (for the SVM and L 1 results, see the Supplementary Materials).
The results in Table 5 indicate that, when ridge regression is used as the classification method, the SC-SGA model achieves the best precision and recall of the four emotional classification models. For neutral emotion, the precision of our proposed model is 85.98%, which is 11.98, 2.88, and 1.60% higher than the Original, LASSO-GA, and LAPPS models, respectively. The recall of our proposed model is 86.12%, which is 7.40, 3.90, and 1.74% higher than that of the Original, LASSO-GA, and LAPPS models, respectively. In terms of positive emotion, the precision of our proposed model is 86.18%, which is 5.23, 2.44, and 2.05% higher than Original, LASSO-GA, and LAPPS, respectively. The recall of our proposed model is 87.88%, approximately 0.70, 8.33, and 0.71% higher than when using the Original, LASSO-GA, and LAPPS models, respectively. Regarding negative emotion, the precision of our proposed model is 83.15%, which is 10.65, 10.07, and 8.15% higher than that of the Original, LASSO-GA, and LAPPS models, respectively. The recall of our proposed model is 87.56%, which is 15.82, 3.19, and 7.20% higher than with the Original, and LAPPS models,respectively. Figures 4,5 in the Supplementary Materials and Figure 4 are box plots obtained from experiments using the four processing models with SVM, L 1 , and ridge regression classifiers for the combined band of the SEED dataset. Clearly, the accuracy of SC-SGA is better than that of the other three classification models and the results are more robust, with fewer fluctuations, which produces a better effect. In particular, the ridge regression method produces smaller fluctuations than the other two classifiers. Figures 5A-D show the confusion matrices obtained by using the four processing models under the ridge regression classification method for the combined band of the SEED dataset. These data show that, among the four processing models, the SC-SGA model achieves the highest accuracy of 86.58%, which is 7.79, 3.25, and 2.46% higher than the accuracy of the Original, LASSO-GA, and LAPPS models, respectively. Among the three emotions, the SC-SGA processing model gives the fewest wrongly classified samples (15 samples), while the Original, LASSO-GA, and LAPPS models produce 31, 25, and 22 classification errors, respectively. The results show that SC-SGA is better than other models in dealing with the SEED dataset. To further validate the model proposed in this article, we now analyze the results using the DEAP dataset.

DEAP Dataset
The experimental results using the DEAP dataset are analyzed in Table 6. Similar to the results with the SEED dataset, the SC-SGA model produces the best effect. The detailed results using SVM and L 1 are given in the Supplementary Materials, and the results using ridge regression are analyzed below.
In terms of positive emotion, the precision of the proposed model is 80.16%, which is 8.84, 7.79, and 3.17% higher than that of the Original, LASSO-GA, and LAPPS models, respectively. The recall of the proposed model is 79.31%, which is 10. 85, 8.25, and 4.43% higher than with the Original, LASSO-GA, and LAPPS models, respectively. For negative emotion, the precision of the proposed model is 81.02%, which is 8.05, 6.06, and 2.96% higher than that of the Original, LASSO-GA, and LAPPS models, respectively. The recall of the proposed model is 82.67%, which is 12.18, 7.54, and 4.24% higher than when using the Original, LASSO-GA, and LAPPS models, respectively.
The above results indicate that LASSO-GA, LAPPS, and SC-SGA have improved to a certain extent. However, the improvement effect of the SC-SGA model is better than that of the existing LASSO-GA and LAPPS models. This demonstrates that SC-SGA provides support for superior emotion classification. Figure 5, 7 in the Supplementary Materials and Figure 6 are box plots obtained from experiments using the four processing models with SVM, L 1 , and ridge regression applied to the DEAP dataset. Clearly, the accuracy of SC-SGA is better than that of the other three classification models. As far as stability is concerned, SC-SGA produces smaller fluctuations and is more stable.
It can therefore be concluded that the SC-SGA model is superior to existing models in the experiments conducted on these two real datasets. This proves that the SC-SGA model is better able to solve the problem of false connections caused by abnormal values.

DISCUSSION
In emotional computing, feature selection is the key to improving model performance. A classic algorithm for EEG feature extraction is the brain network based on Granger causality analysis. However, the inevitable abnormal values in EEG measurements can lead to false connections. Therefore, researchers have developed Granger causality analysis models based on LASSO and causality analysis based on the L 1/2 norm for denoising. However, in the construction of the brain network based on Granger causality analysis, the characteristic EEG data are retained by analyzing the causality relationships between the EEG sensors. Thus, accurately analyzing the causality relationship between sensors and assigning appropriate weights have become the focus of research. The existing sparse Granger causality model does not consider the use of prior knowledge. However, based on known EEG sensor timing signals, researchers can directly analyze the degree of correlation between sensors. We believe that if the timing signals of two sensors are more closely related, they are more likely to have a causality relationship.
On the basis of this idea, we proposed the model described in this article, using the known sensor correlations as prior knowledge to enhance the causality construction ability of the existing sparse Granger model. Based on the existing literature, we selected 16 emotion-related sensor channels and used the L 1/2 norm to remove artifacts in the data while retaining emotion-related information (Zheng and Lu, 2015;Zheng et al., 2017;Chen et al., 2020). Next, we calculated the similarity between sensors. We assumed that the similarity between these sensors was related to the sensor correlation, which means that the similarity degree could be used as a correction to enhance the ability of the model to distinguish different emotional states. From a neurobiological perspective, the cortical electrodes record the total discharge of neurons, and the discharge state of different emotions must be different. Therefore, the similarity between sensors should be used as the a priori weight for sensor causal analysis. The experimental results strongly support our hypothesis.
Although we have proved that the similarity between sensors can enhance the feature selection ability of the Granger causality model, the experimental constructions in this paper are based only on cerebral cortex signals and do not trace the EEG signals. Therefore, in future work, we will further improve this model so that it can be applied to the data after EEG traceability. This will enable further study of the relationship between the sensor similarity and sensor causality.

CONCLUSION
The experimental results presented in this paper show that, compared with existing models, the proposed SC-SGA model has better emotion recognition capabilities and stability. We believe that this model provides an excellent supplement to classification models based on sparse Granger causality analysis. We hope that the proposed model will provide new ideas for the development of sparse Granger causality models, thus promoting the clinical application of the auxiliary diagnosis of affective disorders in the brain science industry.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

ETHICS STATEMENT
We used two public datasets, called SEED and DEAP. The studies involving human participants were reviewed and approved by the DEAP dataset team and the SJTU Emotion EEG Dataset team. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
ZD, RM, and DC proposed the method to write this paper, ZD conducted experiments, CD and NH read the manuscript and modified it. All authors contributed to the manuscript and approved the submitted version.