MISNet: multi-source information-shared EEG emotion recognition network with two-stream structure

Introduction When constructing machine learning and deep neural networks, the domain shift problem on different subjects complicates the subject independent electroencephalography (EEG) emotion recognition. Most of the existing domain adaptation methods either treat all source domains as equivalent or train source-specific learners directly, misleading the network to acquire unreasonable transfer knowledge and thus resulting in negative transfer. Methods This paper incorporates the individual difference and group commonality of distinct domains and proposes a multi-source information-shared network (MISNet) to enhance the performance of subject independent EEG emotion recognition models. The network stability is enhanced by employing a two-stream training structure with loop iteration strategy to alleviate outlier sources confusing the model. Additionally, we design two auxiliary loss functions for aligning the marginal distributions of domain-specific and domain shared features, and then optimize the convergence process by constraining gradient penalty on these auxiliary loss functions. Furthermore, the pre-training strategy is also proposed to ensure that the initial mapping of shared encoder contains sufficient emotional information. Results We evaluate the proposed MISNet to ascertain the impact of several hyper-parameters on the domain adaptation capability of network. The ablation experiments are conducted on two publically accessible datasets SEED and SEED-IV to assess the effectiveness of each loss function. Discussion The experimental results demonstrate that by disentangling private and shared emotional characteristics from differential entropy features of EEG signals, the proposed MISNet can gain robust subject independent performance and strong domain adaptability.


Introduction
Emotion is critical in influencing people's decision-making, social interaction and evaluation of things (Dolan, 2002).By incorporating emotional analysis into human-machine interactions, the machines can better understand humanity and become more natural (Picard, 2001).Numerous studies have been conducted on emotion recognition based on various modes, such as facial expressions (Ko, 2018), speech (Schuller, 2018) and electrophysiological signals.
Electroencephalography (EEG) stands out among these signals due to its objective properties and high temporal resolution benefits (Yang et al., 2021).Specifically, EEG-based affective brain-computer-interfaces (aBCIs) (Mühl et al., 2014) aim to detect affective states from EEG signals and use them in various applications, such as estimating driver drowsiness to improve driving safety (Wu et al., 2016;Cui et al., 2019;Jiang et al., 2020) and establishing an objective detection system for depression (Cai et al., 2020) or post-traumatic stress disorder (Rozgic et al., 2014) to enable self-diagnosis.
Regarding EEG emotion recognition, depending on the head size, body state and experimental environment, the structural and functional variability of the brain may vary between the subjects, resulting in substantial differences in the collected EEG signals (Samek et al., 2013).Traditional machine-learning algorithms usually train a classifier such as support vector machines (Zheng and Lu, 2016) or random forests (Gupta et al., 2018), by utilizing data from a limited number of objects.Nevertheless, due to the EEG signals do not satisfy the independent and identically distributed condition which is caused by the individual difference and nonstationary properties, directly using the subject-dependent models to detect the emotional states of a new subject decreases recognition accuracy.Although collecting a large amount of labeled data from the new subject and using them to fine-tune the classifier is an obvious solution, it is time-consuming and degrades the subjects' experience significantly (Zhao et al., 2021).Hence, this strategy cannot be utilized in the practical aBCI applications.
The unsupervised domain adaptation is an alternative method to align different distribution domains, bridging the existing labeled subjects and new unlabeled ones by identifying their similarities (Wang and Chen, 2021).However, without access to the target domain, it is challenging to train a well-generalized network (Blanchard et al., 2011;Zhou et al., 2023).In contrast, the performance of unsupervised domain adaptation approaches is typically enhanced in the training phase by using unlabeled data from the target domain and employing instance-based, modelbased, or feature-based (Wang and Chen, 2021) methods.
Compared with traditional machine learning, using deep learning to solve domain adaptation problems has relatively low requirements for trainers to select features.Based on the significant advancements in computer vision, speech recognition and natural language processing, we believe that the deep learning methods have potential in EEG emotion recognition.Regarding EEG emotion recognition, there have been sufficient studies on subjectdependent experiments (Kim and André, 2008;Ding et al., 2020;Nath et al., 2020;Pan et al., 2021;Zhang et al., 2022;Song et al., 2023).Several experimental results indicate that deep learning has a great potential for solving domain adaptation problems (Craik et al., 2019;Roy et al., 2019).When using deep learning in aBCI domain adaptation applications, most works regarded all source domains as being the same (Li H. et al., 2018;Li Y. et al., 2018;Luo et al., 2018).Hence all source domains should be merged into the common domain to extract features.This strategy disregards the distribution difference inside the source domains, resulting in the model being unable to train to the optimal effect.When there are outlier source domains, the model is difficult to converge, leading to "negative transfer".On the other hand, some researchers identified the distribution difference mentioned above and trained domainspecific networks directly (Chen et al., 2021a,b;Luo and Lu, 2021), marginally improving recognition performance but overlooking the commonality among source domains.Furthermore, most of these approaches need to judge the distance between the features of target domain and those of each source domain to select one or several similar source domains, and weight the predictions to form the final prediction.When one source domain has larger distance to others, which means there is an outlier source domain, it may occur that one private domain mapping of target domain is far from other private domain mappings, and the model performance would decrease.Therefore, it is necessary to consider the individual difference and group commonality among the multisource domains, further improving recognition performance.
This paper considers the individual difference and group commonality of multi-source domains and proposes the multisource information-shared EEG emotion recognition network based on marginal distribution.In the proposed network, the domain-specific and domain-shared features are extracted and combined dynamically to alleviate the negative transfer problem.Specifically, we first integrate a pre-training strategy into the network to maximize the utilization of current source domain data and reasonably initialize the network, further enhancing its stability.Then, we extract domain-specific features by using private encoders and domain-shared features by a pre-trained shared encoder to represent the individuality and commonality of EEG signals from different domains.Besides employing the maximum mean discrepancy to align the marginal distributions between the source and target domains, two auxiliary loss functions are also designed to improve the astringency of network and align the distributions of private target domains.These loss functions can further enhance the mapping capability of private encoders by considering the information of other private domains.Moreover, rather than heuristically altering the weights of classifiers, we integrate the outputs of classifiers according to the domain-specific and domain-shared feature distributions, thereby dynamically optimizing the network.The experimental results on the SEED and SEED-IV datasets validate the performance of proposed method.
The main contributions of this paper are summarized as follows: • We propose an efficient EEG emotional recognition network that incorporates the individual difference and group commonality of multi-source domains.• We design a two-stream training structure and loop iteration strategy to compute two auxiliary loss functions L was−gp and L diff −gp for aligning the marginal distributions of domainspecific and domain-shared features in target domains.Furthermore, the gradient penalty is also constrained on the above two losses to improve the stability of network.• We introduce the subject-dependent pre-training process to initialize the shared encoder with reasonable parameters, which supplies emotional information to the shared domain.
The remainder of this paper is organized as follows.Section 2 introduces the related works on domain adaptation and EEG-based subject independent emotion recognition.Section 3 proposes the multi-source informationshared network and illustrates the corresponding training process.Section 4 presents the experimental settings and implementation details.Subsequently in Section 5, the results of the ablation experiments are analyzed and the comparisons are made on the SEED and SEED-IV datasets.Finally, Section 6 concludes this work and suggests future research directions.

Related work
This section briefly reviews the concept and methods of domain adaptation and then introduces the relevant work on EEG-based subject independent emotion recognition.

. Domain adaptation
In domain adaptation, which is a rapidly growing transfer learning direction, the labeled source and unlabeled target domains share the same features and categories.The domain adaptation focuses on using the source domain knowledge to process the target domain features when the source and target domain distributions are different (Wang and Chen, 2021).Adopting deep learning for domain adaptation can automatically extract more expressive features and meet the end-to-end needs of practical applications.Typically, three categories are available: instance-based learning, model-based learning, and featurebased learning.The instance-based learning aims to select and weight samples from the source and target domains (Blitzer et al., 2006;Li et al., 2016).The objective of model-based learning is to transfer parameters between different models.By mapping the different probability distributions of the source and target domains, the feature-based learning characterizes the similarity between the source and target domains, which can be classified as marginal, conditional, joint or dynamic distribution adaptation.
In many practical applications involving multiple source domains, the multi-source domain adaptation methods can be used to transfer knowledge from multiple domains and consider domain shifts among source domains to achieve better transfer results.Recently, Zhao et al. (2018) bridged deep learning and multi-source domain adaptation by developing a multipledomain discriminator to align the features of source and target domains, which is a typical adversarial discriminative method.Xu et al. (2018) constructed multiple domain discriminators and classifiers for each source-target domain pair.Then the target labels are voted according to the distribution-weight combining rule.Zhu et al. (2019) extracted distinct source domains into distinct feature spaces and aligned the source and target domains across each feature space.Moreover, they reduced the variance of the classifier output through consistency regularization to directly average the output of classifier and avoid the artificial setting.
. EEG-based subject independent emotion recognition Since the differences in gender, body state and experimental environment between individuals will lead to different neurophysiological activity patterns, the EEG signals of different subjects do not satisfy the independent and identically distributed condition.In this scenario, the issue of domain shift has arisen, that is, under the same emotional stimulus, different individuals may have different EEG responses, resulting in inconsistent distribution of collected EEG signals.The domain shift problem is the main challenge that the subject-independent algorithms need to address.
It not only appears in the different sources of EEG data, but also may appear in the same EEG source due to psychological changes of the participants or technical factors, which greatly limiting the performance of the model.
To reduce inter-subject variability, transfer learning in EEG emotion recognition has two primary branches: domain adaptation and domain generalization.Through the data manipulation, representation learning and learning strategy, the domain generalization aims to learn a model from multiple source domains that generalizes to unseen target domains (Wang et al., 2023).Since the domain generalization methods do not utilize the information of target domain during training, they rarely obtain high recognition accuracy.In contrast, the domain adaptation methods use the information from target domain to transfer knowledge while minimizing domain shifts between the source and target domains.Zheng and Lu (2016) applied transfer component analysis (TCA) and transductive parameter transfer (TPT) to the subject independent EEG emotion recognition on the SEED dataset.Li H. et al. (2018) suggested an alternative method by employing the domain-adversarial neural network (DANN), which involves the adversarial training of feature encoder and domain classifier.Luo et al. (2018) proposed the wasserstein GAN domain adaptation network (WGANDA) by using the gradient penalty to alleviate the domain shift problem.By considering multi-source domain adaptation, Luo and Lu (2021) proposed the wasserstein-distance-based multi-source adversarial domain adaptation (wMADA), which regarded different subjects as different domains and designed an adaptive weight strategy considering the relationship between each domain.Zhao et al. (2021) developed a plug-and-play domain adaptation (PPDA) network, which disentangles the emotional information by considering the domain-specific and domaininvariant information simultaneously.Chen et al. (2021b) took the source data with different marginal distributions into account and proposed a multi-source EEG-based emotion recognition network (MEERNet).Later, they used the disc-loss to improve domain adaptation ability and proposed another multi-source marginal distribution adaptation (MS-MDA) network for subject independent and cross-session EEG emotion recognition (Chen et al., 2021a).
It should be noticed that, most of the existing domain adaptation methods mentioned above either treat all source domains as equivalent or train source-specific learners directly, misleading the network to acquire unreasonable transfer knowledge and thus resulting in negative transfer.Therefore, this paper considers the feature individuality and commonality of distinct domains and weights similar domains based on their feature distributions, enhancing recognition performance.

Methods
In this section, we present the entire architecture and its data transmission process.And then the involved modules are analyzed in detail, and the loss functions are also designed by aligning different domains.

. Framework
The main challenge we aim to address is the domain shift problem caused by the non-stationary of EEG signals and the individual differences among users.Figure 1 shows the framework of the proposed multi-source informationshared network (MISNet) based on marginal distribution, which comprises five components: common encoder, private encoders, shared encoder, private classifiers and shared classifier.In Figure 1, the pink lines, shapes and arrows represent the path of the EEG data matrix X S from the source domains, and the green lines, shapes and arrows represent that of the EEG data matrix X T from the target domain.For each domain, the low-level features are firstly extracted by the common encoder, and then the private encoders are constructed to extract domain-specific information, while the shared encoder extracts the domain-independent information.Subsequently, four loss functions in green squares stand for L mmd , L was−gp , L diff −gp and L cl , which are analyzed in detail in Section 3.2.Finally, the predictions from private classifiers and shared classifier are weighted and summed by the similarity between the source and target features.
Specifically, we sequentially select one subject in the dataset as the target domain data, and the other subjects as the source domain data.As described in Equation ( 1), let X S be the source data matrices, Y S are their labels, and X T are the unlabeled target data matrices, where n represents the number of subjects in source domains.
In the proposed MISNet, the common encoder E C maps the source data matrices X i S and target data matrices X T to the low-level feature space as shown in Equation (2), Then for the low-level features of each source domain, we construct a private encoder to obtain its source domain-specific characteristics.For the target domain, the n private encoders extract domain-specific features as shown in Equation (3), where n is the number of source domains.Meanwhile as shown in Equation ( 4), the shared encoder E S maps the low-level features of source and target domains to the shared domain, Subsequently as described in Equation ( 5), the private classifier C i P and shared classifier C S take (F i SP , F i TP ) and (F i SS , F TS ) as the inputs and output the emotion predictions ( Ŷi SP , Ŷi TP ) and ( Ŷi SS , ŶTS ), respectively, (5) Finally, ŶS is the weighted sum of Ŷi SP and Ŷi SS , ŶT is that of { Ŷi TP } n i=1 and ŶTS , respectively.We use ŶS to calculate the classification loss L cl in the training phase and ŶT to predict the emotion category in the test phase.

. Modules
For the domain shift problem in subject independent EEG emotion recognition, we propose to design an EEG emotion recognition model based on feature disentanglement, extracting domain-specific and domain-shared features to improve the robustness and interpretability of the network.A common encoder is used to extract the low-level features of the EEG signal, and the private encoders map the sample data of each domain to its domain-specific features, reducing the distance between the source and target domains after their feature mapping.The shared encoder extracts domain-shared features and imposes secondary constraints on the mapping distance between the source domain and target domain features.The private classifiers and shared classifier map the domain-specific and domain-shared features to predict the emotions of EEG signals.This section will specifically explain the role of each module in the proposed model and the overall optimization strategy.

. . Common encoder
Despite the individual differences in EEG signals, some common EEG characteristics still exist in the signals of human brain activity.We assume that EEG signals from different subjects share the same shallow feature.Similar to MS-MDA (Chen et al., 2021a) and MEER-Net (Chen et al., 2021b), a common encoder maps all domain data into a common latent space, extracting the low-level features of source and target domains.The common encoder is designed to perform the nonlinear mapping of DE features of EEG signals, which obtains the preliminary mapping of emotional information by extracting low-level features from EEG signals.This lays a solid foundation for extracting the individual difference and group commonality of multi-source domains, therefore enhancing the classification performance.

. . Private encoders and shared encoder
To capture the domain-specific information and consider the difference among different domains, we set up n fully-connected layers as private encoder for each source domain to map the data from the common feature space to the latent private feature space.Inspired by the idea of feature disentanglement in domain generalization (Wang and Chen, 2021), the shared emotional information is extracted through the shared encoder by mapping the low-level feature space to the shared feature space.Note that the shared encoder has the same structure as the private encoder in order to balance their learning abilities.For one iteration in each epoch, the private and shared encoders capture only the features of source domain and target domain that are currently trained.We employ the maximum mean discrepancy (MMD) to calculate the marginal distribution between the source and target domains in reproducing the kernel hilbert space H. MMD is often used to measure the distance between two distributions and is a commonly used loss function in transfer learning.The definition of MMD is where f is the mapping function which is the norm in the reproducing kernel hilbert space.The distributions of x and y is p and q, respectively, and E is the mathematical expectation.However, this equation is challenging to calculate because the feature space of f has infinite dimensions.Thus, Equation 6 is solved by using the linear kernel function to simplify calculation, where ψ denotes the mapping function, the symbol * is the matrix multiplication, and x represents the mean of x in feature dimension.L mmd dominates the domain adaptation direction, alleviating the feature distribution difference between the source and target domains.Due to individual differences, all source domains are linearly independent, indicating that their private feature distributions may be quite distinct.This results in a larger spacing among all source private domains, forming a larger outer contour boundary denoted by the red circle, as shown in Figure 2A.In the process of optimizing iteration, in addition to reducing the distribution distance between the source private and target private domains, it is also necessary to shrink the spacing among source private domains, thereby obtaining a more compact set of source private domains.And thus, the overall boundary of source private domains is also reduced, denoted by the red circle as shown in Figure 2B.On the other hand in Figure 2B, the distribution distance between the shared and private domains is also reduced, forcing the network to extract the domain-independent features.In order to improve the training speed and reduce the network complexity, the above operations are not performed on the distribution distance of different domains, but on the center of each domain.
Specifically, to align the private domains and shared domain, we design two auxiliary loss functions L was−gp and L diff −gp .In the current i-th iteration, the first order L was is proposed to align the marginal distributions of each private domain as shown in Equation ( 8), where F i TP and F j TP denote the mean vector across feature dimensions of the domain-specific features extracted by the i-th and j-th private encoders, respectively.Considering the individual differences and potential outliers of the source domains, we select the features of the target private domain to compress the different private domains through forcing the private encoders to extract the domain-specific information from the target domain rather than the source domains.Furthermore, the soft version of the constraint  with a penalty L was−gp is enforced on the gradient norm of random samples x i T ∈ X T to improve the stability of L was and reduce optimization errors caused by the outlier gradients, where Xw is uniformly defined along straight lines between pairs of points sampled from the i-th target domain-specific feature F i TP and j-th target domain-specific feature F j TP .This idea is motivated by the WGAN-GP (Gulrajani et al., 2017), where the gradient penalty also adopts the no-batch normalization and two-sided penalty strategy.Different from WGAN-GP, F i TP and F j TP are extracted from different private encoders that have the same input feature X ′ T .In addition, we also propose L diff −gp to align the distributions of i-th private encoder and the shared encoder, where Xd is calculated similarly as Xw in Equation ( 9) by using the target private feature F i TP and the target shared feature F TS in the i-th iteration.
With the progress of loop iteration, the overall boundary of source private domains will shrink rapidly under the constraints of L was−gp and L diff −gp .Given an optimal convergence as illustrated in Figure 2C, the centers of the private and shared domains approach one another.The optimal overall private domain has the smallest boundary, which is equal to the boundary of maximum private domain denoted by the red circle as shown in Figure 2C.Moreover, the spacing among the optimal private domains is also minimized or even disappeared.Meanwhile, the fluctuation boundary of the shared domain will approach the boundary of maximum private domain represented by the green and red circles as illustrated in Figure 2C.Since L cl and L mmd dominate the classification and domain adaptation tasks, respectively, the fluctuation boundary of the shared domain cannot easily converge to the optimum result.To sum up, the final convergence of the shared domain center meets the following three requirements: • Meet the minimum L cl requirements for the shared domain.
• After the private encoder mapping process, the source and target domains must have the minimum L mmd .• Meet the minimum distance requirement between the fluctuation boundary of shared domain and the optimum boundary of private domains.
Furthermore, when there is a conflict during the optimization of the shared domain distribution and L cl or L mmd , L cl and L mmd will prioritize to optimize, resulting a small spacing between the boundary of maximum private domain and that of shared domain.Under ideal circumstances, the boundary of the shared domain can be optimal, that is, being the boundary of maximum private domain.At this point, the extracted domain-independent features are optimum for emotion prediction, represented by the overlap of green and red circles in Figure 2C. .

. Private and shared classifiers
Following the private encoders, the private classifiers predict emotion states by using the private features.The softmax activate function is implemented after the fully-connected layer corresponding to each source domain, which transforms hidden states to predict the category label.Like the private classifiers, the shared classifier has the same structure to balance their classification abilities.During the training process, we measure L cl of private and shared classifiers using the label smoothing cross-entropy loss as described in Equations ( 11) and ( 12), q(y, c) log P(c| ŶSS ), ( 11) where Y S is the emotion label of the source domain, ε is the smooth probability, and K is the category number of emotions. .

. Weight sum
Considering both the individual differences and group commonalities, we also propose to weight L cl P and L cl S based on the similarity between the private target domain and private source domain to dynamically adjust the optimization process and balance the weight of the private and shared networks.During the training process, we integrate the private and shared classifiers by calculating MMD between the private and shared domains.The weight of private features w p and shared features w s is calculated by And L cl is calculated by, The weighted sum minimizes the distance between the private and shared domains.The deeper reason is the dynamic adjustment of the optimization process, as the smaller the MMD between the source and target domains, the smaller the difference in their distributions.Specifically, if the distribution between source and private shared domains are closer, w s will be smaller, and then the weight of private encoder is also smaller.Due to the back propagation theory, the less gradient is assigned to private encoder, the more would be assigned to shared one relatively, indicating that the shared encoder has more learning capabilities than private encoder.Therefore, more learning capabilities are assigned to the corresponding encoder.By using Equation ( 15), the outputs of network are weighted based on their distributions, and thus more attentions are paid to the inter-domain predictions with more similarities.Given L cl , L mmd , L was−gp and L diff −gp , the final loss function is represented as, where α, β, and γ are the hyper-parameters.function, which controls the domain adaptation direction of the model.L was−gp aligns the marginal distributions of each private domain.L diff −gp aligns the distributions of i-th private encoder and the shared encoder.By combing those four loss functions as in Equation ( 16), the model can obtain the ability of alleviating negative transfer by considering the individual difference and group commonality simultaneously.
In the test phase, we assume that the optimal convergence boundary has been reached, and the predictions of the private and shared domains are added together to output the final results, In summary, the workflow of the proposed MISNet is presented in Algorithm 1.

. Training strategy
Since the main convergence direction of different private encoders is determined by the classification loss L cl of source domain and their initial feature mapping rules are determined by the distributions of source domains, when there is an outlier source domain, it may occur that one private domain mapping of target domain is far from other private domain mappings.Consider an extreme case of L was−gp in Equation ( 9), when a private feature F i TP is far from other private features F j TP (j = i) in the reproducing kernel hilbert space.If directly using the traditional parallel training strategy, this outlier will affect other private encoders and classifiers due to the addition operation of loss function.Therefore, the prediction and mapping rules of private encoder in the outlier source domain is significantly different from other private encoders, resulting in a large difference between F i TP and F j TP , ultimately impacting the training process.In the above case, L diff −gp in Equation (10) will also be somewhat affected.Since one or more source domains may be distant from other sources, the mapping rules will have large differences even when inputting the same low-level feature of the target domain.Additionally, the model training deviation caused by the outlier source domains is mixed in the shared domain, misguiding the optimization direction of shared encoder and the convergence boundary of shared classifier.The distance between the private domain mapping of outlier source and the shared domain will inevitably cause the outlier domain mapping to stay away from other source domain mappings.In this case of improper optimization, the model tends to converge to most source domains while ignoring the outlier domain, the boundary of shared domain shifts toward the concentrated source domains and deviates from that of maximum private domain, resulting the model not to converge.
In order to alleviate the outlier source problem, we propose the two-stream training structure instead of the parallel one, by only inputting the data of current source domain and unlabeled data of target domain during each iteration.The proposed two-stream structure is depicted in Figure 3A.By separating different source domains in the training process, L was−gp is calculated between the current target private feature F i TP and others F j TP (j = i), alleviating the effects caused by the other source domains.Similarly, L diff −gp only calculates the distance between the target feature of current private encoder and feature of the shared encoder, avoiding mixing the outlier source domain.As a result, the model errors caused by outlier source domain would be further alleviated.
With the proposed two-stream training structure, we adopt the loop iteration strategy as illustrated in Figure 3B to perform the sequential iterations of all source domains, alleviating domain confusion in loss function and improving the robustness of the network.Here, we refer to a loop of the whole source domains as an epoch comprising n iterations, where an iteration corresponds one source domain.Specifically during each iteration, the proposed MISNet successively selects the current source data matrix X i S and its corresponding encoder E i P as the private encoder, then L mmd measures the marginal distribution between F i SP and F i TP , and L was−gp aligns the current target private feature F i TP and others F j TP presented as the blue double arrows in Figure 3B.Meanwhile, L diff −gp aligns the distribution between the current target private feature F i TP and target shared feature F TS presented as the black double arrows in Figure 3B.During the sequential iterations, the change of source domains in two adjacent iterations will cause slight fluctuations in L was−gp and L diff −gp .As the loop iteration progresses, the total loss of the same iteration will decrease between two adjacent loops until it converges.
To make maximum use of the source domain data and ensure that the initial mapping of shared encoder contains sufficient emotional information, we pre-train the common encoder and shared encoder in a subject-dependent emotion classification task.All shuffled source domain data and their labels are used for the training.The common encoder firstly extracts the low-level features, and then the shared encoder is used to capture the deep features.Finally, the shared classifier calculates cross-entropy loss between the output and ground truth labels.In addition, the Adam optimizer is used as the optimizing function, the total epoch is set to 100 and batch size is set to 64.After the pre-training, only the weights of common encoder and shared encoder are saved during the training phase by employing a normally initialized classifier, so that the encoders are initialized to reasonable parameters and the model convergence would be accelerated, which avoids random Frontiers in Neuroscience frontiersin.orginitialization causing the model to not converge or converge to local optima.We can also re-examine the network construction from the perspective of loss function design by using the following two criteria: • L mmd only reduces the distance between the source and target domains in the private domain instead of using the shared domain.• The auxiliary losses of L was−gp and L diff −gp use the features of target domain to narrow the corresponding encoder mapping instead of those of source domain.
The purpose of the first criterion is to avoid misleading the direction of domain adaptation when the shared encoder E S is trained mainly by the classification loss of L cl which extracts the target domain information containing the individual differences among the source domains.
For the L was−gp loss in the second criterion, if directly using a single source domain X i SP as the input of private encoders and aligning their outputs in each iteration, the wrong gradient of domain information in X i S would be mixed into other encoders E j P (j = i).Therefore, we use the target private features F i TP and F j TP to align different private domains in L was−gp .On the other hand, in the actual iteration process of L diff −gp , each private encoder is simultaneously updated by L cl and L mmd , and different source domain is input in sequence during one loop.When directly adopting the features of source domain as the input will introduce fluctuations in L diff −gp , leading to the training collapse.Therefore, we use the features of target domain instead to constrain the domain adaptation direction.After aligning the target private features F i TP and target shared features F TS , the shared classifier C S could classify the emotion of target domain correctly.

Experimental settings
This section describes the datasets used for evaluation, EEG data pre-processing and implementation details in the proposed MISNet.

. Datasets
We evaluate the proposed network on SEED (Duan et al., 2013;Zheng and Lu, 2015;Liu et al., 2022) and SEED-IV (Zheng et al., 2019), which are public datasets commonly used for EEG emotion recognition.The SEED dataset contains EEG signals from 15 Chinese participants (seven males and eight females).The participants are required to watch 15 Chinese film clips chosen from a pool of materials as stimuli to elicit positive, neutral and negative emotion.Additionally, each film clip contains scenes and audios that is ∼4-min long to prevent viewer fatigue.The clip order prevents the continuous display of two clips depicting the same emotion category.Each subject participates in three sessions containing 15 trials, where each session is conducted on a separate day.For feedback purposes, the participants are asked to complete a questionnaire and report their emotional responses immediately after viewing each clip.The EEG signals are recorded by an ESI NeuroScan system at a sampling rate of 1,000 Hz through a 62electrode cap according to the international 10-20 system.The SEED-IV dataset is similar to the SEED, but it has four emotion categories (happiness, sadness, fear and neutral) and conducts 24 trials per session.

. Data pre-processing
For the data pre-processing of EEG signals, the original EEG data was downsampled to 200 Hz and a bandpass filter from 0 to 75 Hz was applied, and a 512-point short-time Fourier transform was used with a non-overlapped Hanning window of 1 s to calculate the frequency domain features.Considering their effectiveness in the EEG emotion recognition task (Yang et al., 2017;Li et al., 2022), the DE features were then computed on the five bands as δ: 1-3 Hz, θ : 4-7 Hz, α: 8-13 Hz, β: 14-30 Hz and γ : 31-50 Hz (Zheng and Lu, 2015).For the gaussian distribution, the DE feature is defined as shown in Equation ( 18 where X obeys the gaussian distribution N(µ, σ 2 ), x is the element of X.Therefore, the 310-dimensional DE features (62 channels multiplying with five frequency bands) were computed, and the features were smoothed with the conventional moving average and linear dynamic system.After the pre-processing steps, each session contains 3,394 samples for the SEED dataset and 822 samples for SEED-IV dataset.

. Implementation details
In the proposed MISNet, the common encoder is a 3-layer fully-connected layer with 310-256-128-64 nodes, which extracts the low-level features of source and target domains.Each private encoder and shared encoder are composed of a fully connected layer designed as 64 (input layer)-32 (output layer)-LeaklyRelu activation.Besides, a single fully-connected layer is chosen for each private classifier and shared classifier with a hidden dimension from 32 to the number of emotion categories.Note that there is no batch normalization layer, since we use the gradient penalty guidelines in Equations ( 9) and (10).The LeakyRelu activation function with a negative slope of 0.01 is used in all hidden layers.In addition, we normalize the data of source and target domains to enhance performance by using the electrode-wise method in Chen et al. (2021a).
For the hyper-parameters in Equation ( 16), we consider the trade-off among the primary losses of L cl and L mmd as well as auxiliary effects of L was−gp and L diff −gp , and set α = where nos means number of samples and β = γ = α 100 .In our network, we set the learning rate to 0.01 and the batch size to 64.In addition, the

Experiments and results
In this section, we first test different values of loss weights β and γ to evaluate the effectiveness of the hyper-parameter settings.Next, the ablation experiments are conducted in terms of loss functions, dynamic convergence of network and visualization of mapping features.Then, we compare the proposed network with other competing methods by using the LOSO strategy on the SEED and SEED-IV datasets.Finally, the experiments of adding noise are conducted to demonstrate the robustness of the proposed network.

. Hyper-parameter evaluation
In this section, we evaluate the hyper-parameters of β and γ in Equation ( 16) within a certain range based on previous experience to explore the impact of different settings in terms of accuracy on the SEED dataset.The corresponding results are illustrated in

. Ablation study
To demonstrate the effectiveness of loss functions in MISNet, we evaluate the performance of the ablated network on the SEED and SEED-IV datasets, as shown in Table 1.
The subject independent recognition performance is evaluated by using the metrics of mean accuracy (Mean) and standard deviation (Std.).Table 1 indicates that all loss functions can improve recognition performance, affording the mean accuracies of 88.80 and 74.60% on the SEED and SEED-IV datasets, respectively.In addition, the proposed MISNet achieves the standard deviation of 6.24 and 9.30% on the SEED and SEED-IV datasets, respectively, showing better inter-subject stability.Discarding L mmd in our framework leads to a significant performance degradation compared with depriving other loss functions, proving its importance in domain adaptation.And thus, the higher weight should be assigned to the loss of L mmd than those of L was−gp and L diff −gp .Furthermore, removing L was−gp and L diff −gp simultaneously will damage the domain adaptability more than removing any of them individually, since they control the relationship between private features and shared features jointly.In addition, the gradient penalty of two auxiliary loss functions L was−gp and L diff −gp allows stable convergence of the private and shared domains, which is reflected in the improvement of average accuracy and decrease of standard deviation.One of the primary goals of the proposed network is to align the distributions of private and shared domains to alleviate the negative transfer in domain adaptation caused by individual differences in EEG signals.Next, we evaluate the convergence process of proposed MISNet and visualize the domain mapping of the private and shared domains in a two-dimensional way by using the t-distributed stochastic neighbor embedding (t-sne).The t-sne method is employed to evaluate the similarity between feature representations during the training process, meaning the closer points have a higher similarity in real space.The similarity between source and target domains affects the domain adaptability of network, while the distinction among the emotion categories reflects its emotion discrimination.The dynamic convergence process of the proposed MISNet is shown in Figure 5.
It can be depicted from Figure 5A that the original distributions of the DE feature projection of different source domains are chaotic and irregular, due to the individual differences of EEG signals.After the designed subject-dependent pre-training strategy, the different emotional categories of source shared domain represented by different gray shapes have been distinguished to some extent, which means the shared encoder has a fundamental emotion classification ability, as illustrated in Figure 5B.Figures 5C-F reveal that as the convergence process progresses, the private domain of each source domain (represented by different colors) is clustered based on their emotion category (represented by different shapes), indicating a gradual improvement of emotional discrimination.In addition, the cluster center of each domain is distributed near the middle of the figure, indicating that all cluster centers are aligned among the private domains.As the convergence process progresses, the space of private domains gradually shrinks, indicating that the model is eliminating the interference of spacing among private domains.Furthermore, it can also be found from Figure 5F that, when the model has converged, the source private domains exhibit different emotional distribution patterns, this is because we align the domain centers instead of aligning the distribution of private features in L was−gp .Additionally, the target shared features (represented by red) are almost always within the source shared features with corresponding emotion categories (represented by gray), indicating that the shared encoder can effectively capture the shared emotion information.After the final optimization, the center of the shared domains roughly coincides with that of the private domains, and the boundary of shared domains is close to that of maximum private domain.It can be concluded that the convergence process of MISNet is identical to the anticipated and has a strong domain adaptability for subject independent EEG emotion recognition.

. Comparisons with competing methods
In this section, we compare the proposed MISNet with several competing methods on the SEED and SEED-IV datasets.Table 2 shows the comparison results in terms of the mean classification accuracy and standard deviation with competing methods.Here the results of MS-MDA (Chen et al., 2021a) were obtained by using the LOSO strategy with the open source codes.
It can be seen from Table 2 that the domain adaptationbased methods significantly improve the recognition performance compared with directly using SVM in the subject independent experiments.Furthermore, most domain adaptation methods using multi-source (Chen et al., 2021a,b;Luo and Lu, 2021;Gong et al., 2022;Zhu et al., 2022) can attain better performance than those without multi-source (Li H. et al., 2018;Luo et al., 2018;Ma et al., 2019), which indicates the importance of considering individual differences inside the source domains.Specifically, the proposed MISNet outperforms most of the competing methods, achieving a mean accuracy of 88.8 and 74.6% on the SEED and SEED-IV datasets, respectively.This is attributed to the designed loss functions L was−gp and L diff −gp , which enhance the  (Luo and Lu, 2021), the proposed method has simpler parameter tuning process during the training phase compared to wMADA-β (Luo and Lu, 2021).
In order to prove the generality of the proposed model, we compare it with the competitive method of MS-MDA (Chen et al., 2021a) by using the indicators of F1 score, sensitivity and specificity in Table 3.In the experiment, all folded performance was used to estimate the indicators of F1 score, sensitivity and specificity under the LOSO strategy.It can be seen from Table 3 that, the proposed MISNet has higher generalization ability on all three aspects than MS-MDA (Chen et al., 2021a) and performs stably in all subjects.
To further demonstrate the robustness of the proposed MISNet network, we evaluate the performance of the network on the SEED  The bold font in the table represents the top three in the results.
dataset by adding gaussian noise in the test data, as shown in Equation ( 19), where X T are the target matrices, N denotes the gaussian noise which obeys the gaussian distribution N(0, 1) and K is the noise coefficient.In the experiment, we verify the robustness of the proposed network to noise by gradually increasing the noise coefficient, as shown in Figure 6.It can be seen from Figure 6 that, with the noise coefficient K increasing from 0 to 0.3, that is, the signal-to-noise ratio gradually decreases, the recognition accuracy of the model gradually decreases inevitably, and the recognition variance increases.When the signal-to-noise ratio is relatively large, that is, when K is equal to 0.1 and 0.2, the mean accuracy of the proposed model still remains above 75%, indicating that it has a strong tolerance for noise.
In order to show the recognition ability of the proposed MISNet among different emotion categories, Figure 7 shows the confusion matrix on the SEED and SEED-IV datasets.It can be seen from Figure 7 that on the SEED dataset, the MISNet achieves the recognition accuracies of 78.77, 93.02, and 95.03%, respectively on the emotion categories of negative, neutral and positive, demonstrating strong discriminative capability across emotions.And the results on the SEED-IV dataset infer that our framework has decent accuracies for the emotion categories of neutral, sad and happy.While for the category of fear, the proposed MISNet confuses it with sad because these two emotions are relatively similar on EEG signals.

Conclusion
Although the EEG signals have the advantage of spontaneous and non-subjective characteristic in emotion recognition filed, they still have several limitations, including the individual difference and noisy labeling issues.In this paper, the main challenge we aim to address is the domain shift problem caused by the non-stationary of EEG signals and the individual differences among users.
For the purpose of alleviating the domain shift problem, we propose to consider the individual differences and group commonalities simultaneously, improving the domain adaptation ability of the model.In the proposed MISNet, the decoupling network structure is designed to extract the private domain features and shared domain features of each domain data.In order to constrain overall optimization direction, the classification loss function and domain adaptation loss functions are adopted.In addition, we analyze the convergence process of network to design the auxiliary loss functions of L was−gp and L diff −gp in order to align the different domain centers.A pretraining strategy is also used to enhance model stability and ensure that the initial mapping of shared encoder contains sufficient emotional information.Furthermore, the convergence process of the proposed network is dynamically displayed through t-sne mapping.The results on the SEED and SEED-IV datasets demonstrate the effectiveness of our proposed MISNet frameworks.
Since the proposed MISNet needs the unlabeled data of target domain to obtain domain information, it is available for the offline situations in real life, and achieves highquality emotional awareness by decoupling personality and common emotional characteristics.Our future work will focus on disentangling the domain information from EEG data with a reasonable explanation, thereby constructing a more robust network.

FIGURE
FIGUREOverall framework of the proposed MISNet network.

FIGURE
FIGUREThe optimization process of L was−gp and L di −gp .S , S , S , and S represent the source private domains, respectively, and S is the shared domain.L denotes the center distance between source private domains, and D is the center distance between each source private domain and shared domain.The red circle symbolizes the boundary of maximum private domain, and the green circle represents the fluctuation boundary of the shared domain.(A) the initial states of source private domains, (B) the objective of optimization, and (C) the circumstance of optimal convergence.

FIGURE
FIGURE The proposed training strategy in MISNet.(A) The proposed two-stream structure, (B) loop training strategy.
Adam optimizer is used as the optimizing function, the total epoch is set to 200, and a cosine annealing schedule is used to determine the learning rate for each epoch.The proposed framework is implemented in PyTorch with version of 1.11 on NVIDIA RTX 1080Ti GPU.The model parameters and computation cost are FLOPs 9.88M and params 154.4K, respectively.

Figure 4 .
It can be seen from Figure 4 that, both L was−gp and L diff −gp are affected by the selected hyper-parameters of β and γ , with L diff −gp being more sensitive than L was−gp .Even with the worst recognition result of 83.05% by setting β = α 100 and γ = α 1000 , the proposed network still maintains the strong ability of subject independent emotion recognition.Since setting β = γ = α 100 has achieved best recognition performance on SEED dataset, we use this setting to evaluate the effectiveness of network in the following experiments.

FIGURE
FIGUREEvaluations of di erent settings of β and γ in terms of accuracy on the SEED dataset.

FIGURE
FIGURE Training process of the proposed MISNet with t-sne mapping.Red color denotes the target shared features and gray color represents the source shared features, while other colors symbolize the source private features.Di erent shapes denote the di erent emotion categories: •, , represents positive, neutral and negative emotions, respectively.(A) the original distributions of the DE feature projection of di erent source domains, (B) the initialization e ect of subject-dependent pre-training, (C-F) the training process of MISNet.

FIGURE
FIGUREThe robustness of MISNet to noise on the SEED dataset.

FIGURE
FIGUREConfusion matrices of predictions on SEED and SEED-IV dataset.

Training phase: 4 for epoch in epochs do 5 for i in n do 6
To sum up, L cl is the classification loss function, which controls the overall optimization direction of the model.L mmd is the domain adaptation loss TABLE Ablation study of loss functions on SEED and SEED-IV datasets.
* w/o L * denotes the ablated network trained without the loss function L * .
adaptability of the network.The evaluation index of standard deviation means the stability performance of networks across subjects in the dataset.Compared with the typical existing methods on two benchmark datasets of SEED and SEED-IV, the proposed MISNet demonstrates the competitive ability overcome the individual differences.Although a small gap exists compared to wMADA-β https://github.com/VoiceBeer/MS-MDAdomain TABLE Comparison results of the proposed MISNet with competing methods on the SEED and SEED-IV datasets.
TABLE Comparison generality with competing method on the SEED and SEED-IV datasets.