Using Multiple Decomposition Methods and Cluster Analysis to Find and Categorize Typical Patterns of EEG Activity in Motor Imagery Brain–Computer Interface Experiments

In this study, the sources of EEG activity in motor imagery brain–computer interface (BCI) control experiments were investigated. Sixteen linear decomposition methods for EEG source separation were compared according to different criteria. The criteria were mutual information reduction between the source activities and physiological plausibility. The latter was tested by estimating the dipolarity of the source topographic maps, i.e., the accuracy of approximating the map by potential distribution from a single current dipole, as well as by the specificity of the source activity for different motor imagery tasks. The decomposition methods were also compared according to the number of shared components found. The results indicate that most of the dipolar components are found by the Independent Component Analysis Methods AMICA and PWCICA, which also provided the highest information reduction. These two methods also found the most task-specific EEG patterns of the blind source separation algorithms used. They are outperformed only by non-blind Common Spatial Pattern methods in terms of pattern specificity. The components found by all of the methods were clustered using the Attractor Neural Network with Increasing Activity. The results of the cluster analysis revealed the most frequent patterns of electrical activity occurring in the experiments. The patterns reflect blinking, eye movements, sensorimotor rhythm suppression during the motor imagery, and activations in the precuneus, supplementary motor area, and premotor areas of both hemispheres. Overall, multi-method decomposition with subsequent clustering and task-specificity estimation is a viable and informative procedure for processing the recordings of electrophysiological experiments.

In this study, the sources of EEG activity in motor imagery brain-computer interface (BCI) control experiments were investigated. Sixteen linear decomposition methods for EEG source separation were compared according to different criteria. The criteria were mutual information reduction between the source activities and physiological plausibility. The latter was tested by estimating the dipolarity of the source topographic maps, i.e., the accuracy of approximating the map by potential distribution from a single current dipole, as well as by the specificity of the source activity for different motor imagery tasks. The decomposition methods were also compared according to the number of shared components found. The results indicate that most of the dipolar components are found by the Independent Component Analysis Methods AMICA and PWCICA, which also provided the highest information reduction. These two methods also found the most task-specific EEG patterns of the blind source separation algorithms used. They are outperformed only by non-blind Common Spatial Pattern methods in terms of pattern specificity. The components found by all of the methods were clustered using the Attractor Neural Network with Increasing Activity. The results of the cluster analysis revealed the most frequent patterns of electrical activity occurring in the experiments. The patterns reflect blinking, eye movements, sensorimotor rhythm suppression during the motor imagery, and activations in the precuneus, supplementary motor area, and premotor areas of both hemispheres. Overall, multi-method decomposition with subsequent clustering and task-specificity estimation is a viable and informative procedure for processing the recordings of electrophysiological experiments.

INTRODUCTION
A brain-computer interface (BCI) is a system for controlling a device using registered brain activity. To control the device, the BCI operator is usually required to perform different mental tasks and does not need to use any muscles. In practice, BCI systems are intended to create new means of communication between the operator and his/her environment, which is essential for the paralyzed. Recently, BCI technology has been used not to create a new communication channel but rather to enhance the resting one lost due to stroke or trauma. BCI rehabilitation applications are a growing field, with more clinical data reported every year (Ang et al., 2011(Ang et al., , 2015Ramos-Murguialday et al., 2013;Ono et al., 2014;Frolov et al., 2017b;Mane et al., 2019;Bai et al., 2020). Aside from in the clinic, these systems can be used in fundamental research. The feedback provided by BCI system operation makes its user highly concentrated on performing the tasks required to control it. The user has to produce stable and repeatable patterns of the brain activity recorded, greatly facilitating the investigation of the task-specific patterns of brain activity (Frolov et al., 2012).
Although there are numerous BCI classifiers, i.e., algorithms for brain activity pattern classification, these algorithms typically do not provide pattern features that have clear physiological interpretation. In this work, we have decided to investigate the EEG recordings of our BCI experiments with different source separation methods. Methods such as Principal and Independent Component Analysis (ICA, Hyvärinen et al., 2004), as well as other Blind Source Separation (BSS, Cichocki and Amari, 2002) techniques and non-blind Common Spatial Patterns (CSP, Bashashati et al., 2007), are widely used in EEG research both in the BCI and non-BCI fields. As will be described later, all of the methods considered decompose the signal into the sum of sources, or components, each characterized by its topographic map of weights and temporal activity.
Our experiments are mainly focused on the investigation of the performance of different motor imagery tasks while controlling the BCI (Frolov et al., 2012(Frolov et al., , 2017a. Earlier, we showed that ICA can be effectively used to find the sources of task-related sensorimotor mu-rhythm desynchronization (Frolov et al., 2012) as well as to extract the patterns of activity of three other brain areas involved in motor imagery (Frolov et al., 2017a). However, these results were obtained using a single ICA method. In this paper, we investigate the results of the application of several source separation methods for processing our records. The methods are compared in terms of how many shared components they provide as well as the task specificity of EEG patterns they find. We are mainly focused on studying sources that have clear physiological interpretations.
In Delorme et al. (2012), it was shown that physiologically meaningful components are dipolar. Results were obtained by using several ICA methods to decompose EEG recordings corresponding to visual memory tasks performed. A component being dipolar means that its topographic weight map, which can be looked upon as a scalp potential distribution, is adequately approximated by the distribution resulting from a single current dipole. Note that EEG being a result of the superposition of multiple dipolar source potentials at a time is an accepted model of the EEG origin (Niedermeyer and Da Silva, 2005). The fact that a component is dipolar implies that it results from well-localized changes of electric activity in the brain, allowing it to be attributed to a certain brain area activation or deactivation.
Although the areas the activity of which results in the appearance of dipolar components are typically rather small (about several millimeters in diameter), they significantly impact the EEG signal recorded by all of the electrodes. That is why the main task of the decomposition method is to unmix the signal components. The unmixing done by the ICA and other BSS methods is often based on reducing the mutual information between the components (Hyvärinen et al., 2004). That is why mutual information reduction (MIR) is another criterion of method comparison.
Another indication of a component's relevance is the repeatability of its occurrence among all the subjects and records. The components reflecting common or task-specific EEG phenomena are expected to be found in most of the recordings. That is why a cluster analysis was applied to the topographic maps and power spectral densities of the components found. The results allow the evaluation of the rate at which a certain phenomenon can be discovered by the methods tested.
The paper is organized as follows. The next section provides information on the dataset and participants, as well as details of the 16 blind and non-blind signal decomposition methods used in this work. The measures of both similarity between the methods and similarity of the extracted components are described. The algorithms for estimating the task-specificity of the components and for cluster analysis are given. The results of the method comparison are then represented, and clusters of the most frequently encountered components are shown together with the cluster statistics. The results obtained are discussed, and possible physiological interpretation of the extracted components is given.

METHODS
We used the data from the BCI experiments carried out at the Institute for the Higher Nervous Activity and Neurophysiology of the Russian Academy of Science. The study was approved by the Institute's ethical committee.

Participants
Twenty-three volunteers (7 females and 16 males) aged from 21 to 36 (26 ± 4) participated in the experiments. The volunteers were right-handed according to the Edinburgh Handedness Inventory and had no reported neurological or other disorders. All of them gave written informed consent.
None of the participants had an experience of controlling a BCI. The ability of the participants to perform motor imagery was not assessed prior to the experiments.

Experimental Procedure
The original EEG dataset for the work was obtained using the same procedure as described in earlier works (Frolov et al., 2012(Frolov et al., , 2017a. Each subject had to sit relaxed, looking at the center of the screen or had to imagine flexion of either his left or right hand according to visual cue presented on the screen. Each task had to be performed for 10 s, with a 2 s preparation period. There were 20 cue presentations for each hand motor image, which were split into blocks containing 2 cues for each hand. In each block, the cues were given in a random order. Each motor imagery task was preceded by the relaxation period. There were from 10 to 20 experimental days (14 ± 3) for each subject, with 1-3 day gaps possible between the sessions.
The EEG data were recorded using the NVX52 acquisition device (Medical Computer Systems, Russia) with 32 electrodes placed according to the 10-10 system. The data were band-pass filtered with a 5-30 Hz Butterworth filter as well as a 50 Hz notch filter to suppress power line interference.

EEG Classification
The EEG data were classified with a Bayesian classifier under the assumption that the signal has a multivariate Gaussian distribution with a zero mean and a different covariance matrix for each task classified: where C i is a covariance matrix of the EEG corresponding to the ith task, and x is an EEG sample. According to the Bayes rule, the probability that a given sample will correspond to the ith class is proportional to P (x|i) P(i), where P(i) is the probability of the ith task being cued. Thus, for a signal epoch logarithm of the probability of all its samples to correspond to the ith class is proportional to 1 n t − t x T t C −1 i x t − ln det C i + ln P (i) + const, where summation is performed over all the epoch samples, n t is the epoch length in samples, and the constant term is independent of both the signal and the class number. Since 1 n t − t x T t C −1 i x t can be substituted with trace cov(X)C −1 i , where cov(X) is the epoch covariance estimate under the assumption of a zero mean, the class number is We used a window of 1 s length sliding with a 100 ms shift to present the feedback during the sessions. The probabilities P (i) were estimated based on the cue durations. The classifier was updated after each block of cues, with no feedback during the first block.

Signal Decomposition
All of the blind and non-blind source separation methods used can be, in general, represented in the form of multivariate linear decomposition with an optional noise term. X = a 1 ξ 1 + . . . + a n ξ n + N = A + N, where X is the decomposed signal in matrix form, a i , i = 1, . . . , n, are columns of A that define the source (component) weights representing the source topographic maps, ξ i , i = 1, . . . , n, are row vectors of the source activities, n is the number of EEG channels, and is the optional noise term. Matrix A is called the mixing matrix, and its inverse, W, is called the unmixing matrix. The invertibility of A is based on the assumptions that there are as many components as there are signal channels and that the signal correlation matrix has full rank. These assumptions are in general not required but are common for linear source separation methods (Cichocki and Amari, 2002;Hyvärinen et al., 2004), and all of the decomposition methods used in the paper were derived under these assumptions. Usually, the signal is whitened before searching for decomposition (3) so that the covariance matrix of X becomes an identity matrix. Also, it is well-known that the component activity variances are not defined, thus requiring the application of some constraints when the decomposition is computed. That is why we shall further consider the decomposed signal to be whitened, i.e.
where V is a matrix such that Vcov (X) V T = I. If we also suppose that all ξ i , i = 1, . . . , n, have unit variance, then the mixing matrix for Z is orthogonal, Z = U , UU T = I, and the EEG mixing matrix is A equals VU.
We used 16 source separation methods to obtain the decomposition (3).
PCA, or Principle Component Analysis, is a classic example of a technique for obtaining the decomposition (3). It uses Singular Value Decomposition to diagonalize the signal covariance matrix. Usually, the components with too much or too little variance are discarded. If one rather chooses to keep all of the components under the constraints that their variances are equal to 1, this will become signal whitening. In this case, U = I, and the unmixing matrix is not orthonormal.
KURT is an ICA method based on maximizing the difference of the component distributions from normal. The difference is measured by the absolute value of the component excess kurtosis, which serves as a cost function for iterative component search (Delfosse and Loubaton, 1995;Hyvärinen et al., 2004), i.e., the columns of unmixing matrix W are estimated sequentially by maximizing the excess kurtosis for w T i Z under the assumption that W is orthogonal.
CUMUL is an ICA method using the signal non-stationarity as the criterion of statistical independence (Hyvarinen, 2001). The non-stationarity is measured by the fourth-order cumulant: The decomposition seeks to maximize the sum of the source cumulants. The proof of non-stationarity maximization being a criterion of component independence can be found in Hyvärinen et al. (2004), chapter 18. The time lag τ serves as the method parameter. In our experiment, it was set to 100 ms to extract the components with dominant alpha-band frequencies.
FastICA is an ICA method in which mutual information reduction between the components serves as the statistical independence criterion. The mutual information between the components ξ i , i = 1, . . . , n, is defined as where H(·) denotes entropy. The entropy of linear transformation is H (WZ) = H (Z) + det(W) . Under the assumption of Z being whitened and the components having unit variance, W is orthogonal, which implies that det (W) = 1 and H (WZ) does not depend on det (W). As a result, H w T i Z can be used as a cost function for sequential search of the unmixing matrix columns w i .
Since computing the entropy requires knowledge on the signal distribution, different parametric approximations are used. FastICA uses approximation in the form where g (x) is a non-linear function and v is a random normally distributed variable with unit variance. FastICA can be computed using different non-linearities (Bingham and Hyvärinen, 2000;Hyvärinen et al., 2004). In this work, we used two functions, namely, tanh(x) and x exp (−x 2 /2). The corresponding methods are denoted as FastICAT and FastICAG. RunICA, or extended infomax, is an ICA method that is similar to FastICA. The method takes into account both superand sub-Gaussian distributions by applying two different nonlinearities, g + (x) = −2 log cosh (x) and g − (x) = log cosh (x) − x 2 /2, and switching between them. Originally, the infomax algorithm was introduced by Bell and Sejnowski (1995) based on a neural network approach to maximize the entropy of the neural network output. It was later reformulated in terms of parametric non-linearities for source entropy estimation (Lee et al., 1999).
AMICA, which stands for Adaptive Mixture Independent Component Analysis, is an ICA method based on the assumption that all EEG components have generalized Gaussian mixture distributions AMICA can also segment the signal into several parts corresponding to different models (8) when the number of the models is set to higher than 1. All distribution parameters, unmixing matrix coefficients, and model scores (in the case of several models) are estimated using the expectationmaximization algorithm (Palmer et al., 2006(Palmer et al., , 2008. In our work, we compared AMICA with the default settings (2 models, 2 distributions in mixture) to AMICA with two models and Gaussian distributions with means fixed to zero (2 models, 2 distributions in mixture, ρ l i,j = 2, µ l i,j = 0, where l indicates the model, l = 1, 2, and i = 1, . . . , n, j = 1, 2), and to AMICA with a single model and Gaussian distributions with means fixed to zero (1 model, 2 distributions in mixture, ρ ij = 2, µ ij = 0, i = 1, . . . , n, j = 1, 2). The two latter cases will be denoted as AMICA1 and AMICA2, respectively.
PWCICA is an ICA method mapping the signal into a complex domain using the first derivative (rate) of the signal as its imaginary part. The algorithm is described in Ball et al. (2016). It uses the complex variant of FastICA in order to find decomposition (3) in the complex domain (Bingham and Hyvärinen, 2000). After that, the real-valued unmixing matrix is computed using the algorithm provided in Ball et al. (2016).
SOBI, or Second Order Blind Identification, is a blind source separation method that uses joint diagonalization to remove signal cross-covariance for a set of defined time lags (Belouchrani et al., 1997). Particularly, a set of time lags δ k is specified, and corresponding cross-covariance matrices are estimated: Next, the unmixing matrix W is found by joint diagonalization of the estimated cross-covariance matrices: where · off denotes the Euclidean norm of the off-diagonal part of a matrix. The norm of the unmixing matrix column is constrained to be equal to 1. There exist several algorithms for solving the problem (11). We used one proposed in Ziehe et al. (2004) to obtain the solution. CSP, or Common Spatial Patterns (Ramoser et al., 2000;Bashashati et al., 2007Bashashati et al., , 2015, is a non-blind decomposition method, in contrast to the source separation techniques described above. This method utilizes the information on signal segmentation with respect to the experimental tasks in order to find the decomposition (3). The method is used for feature extraction in two-class separation problems. The decomposition is sought that maximizes the component variance ratio for the states classified under the assumption that total component variance is fixed. This is equivalent to finding matrix W such that where C i is the i-th class covariance matrix and D 1 is a diagonal matrix. The equations (12) have an explicit solution where W and D 1 result from singular-value decomposition (SVD) of the matrix D −1/2 U T C 1 UD −1/2 with orthogonal matrix U and diagonal matrix D given by SVD of C 1 + C 2 . We used several CSP decompositions in this work, namely, CSP12 comparing relaxation and left hand motor imagery, CSP13 comparing relaxation and right hand motor imagery, CSP23 comparing left with right hand motor imagery, and CSP1X comparing relaxation and both left and right hand motor imagery. Although equations (12) are solved using signal epochs corresponding to the selected tasks only, the unmixing matrix was used to estimate the activity of the components at all other time moments.
MCSP is a generalization of the CSP method for multipleclass problems. There are several methods of generalization (Lee et al., 2005;Bashashati et al., 2015). The most frequent approaches are to either compute all pairwise CSP and combine the corresponding components into a new signal to be classified or to use classifier voting or a tree of some sort when only two classes are compared at each step (e.g., relaxation vs. motor imagery with subsequent discrimination between the hands if the motor imagery was recognized). However, these techniques would not give us any additional information in terms of sources, as we have already considered all pairwise CSP decompositions for our experimental tasks, as well as the CSP1X comparing relaxation and motor imagery. That is why we have decided to use a different approach. Problem (12) was generalized so as to find W such that The equations are written for the case of three tasks, but the generalization for the case of an arbitrary number of classes is straightforward. The problem in general has no explicit solution and can be solved numerically using any joint diagonalization technique. The joint diagonalization algorithm was the same as was used to perform the SOBI decomposition (Ziehe et al., 2004). It should be noted that unlike the two-class CSP, component variance ratio maximization (or minimization) for different classes is not guaranteed, but if any component has high (small) variance for a certain class, it will have small (high) variance for other classes.

Mutual Information Reduction
The mutual information reduction (MIR) provided by the decomposition methods is defined as where the mutual signal information I is given by equation (6). Using (6), we get Since X = A , and thus The entropy of each individual channel and component was estimated and bias-corrected as in Delorme et al. (2012).

Dipolar Component Selection
The decomposition (3) shows that the contribution of each component to the signal registered by the EEG electrodes is given by the weight vector, which can be treated as potential distribution over the scalp surface. This allows the source of electrical activity represented by the component to be localized.
As has been shown for the case of visual memory tasks, the physiologically meaningful components tend to have dipolar distributions of their weights, i.e., the distributions that could be adequately approximated by a single current dipole inside a head model (Delorme et al., 2012). We had neither anatomical magnetic resonance imagery (MRI) scans nor digitized electrode positions for most of our subjects. This is why a standardized head model based on the ICBM MNI atlas (Fonov et al., 2009(Fonov et al., , 2011 was used for estimating the dipolarity of the component weights. The dipolarity was computed as a residual variance of the best single dipole fit for the corresponding distribution. The fit was considered acceptable if the residual variance did not exceed 10%.

Shared Components
The number of the same components found by different decomposition methods can be used as a measure of similarity between the methods. On the other hand, the more methods reveal the component, the less likely it is that the component is an artifact of a certain mathematical procedure underlying the decomposition algorithm. That is why we focused our attention on the components found by different methods. Two component similarity measures were used to determine which components were shared between the methods. The first measure was the absolute value of the cosine between normalized component weight vectors given by the corresponding mixing matrix columns. The second one was the absolute value of the Pearson correlation between the component activities. Two thresholds were chosen for the measures. The components were considered the same when both measures exceeded the corresponding thresholds. We have chosen 0.9 as the threshold for the topographic map similarity and 0.8 as the threshold for activity correlation.
When the number of shared components is obtained, the similarity between the decomposition methods can be measured as sim ij = n s n i + n j − n s , where n s is the number of shared components, and n i and n j are the numbers of components obtained by the i-th and jth methods, respectively. The similarity measure varies from 0, when there are no shared components, to 1, when the methods have provided identical decomposition. Also, only the dipolar components can be considered when the similarity (17) is calculated, allowing non-dipolar components that are unlikely to have physiological meaning to be discarded. The number of methods that found a component minus one will be called the component rank. The component rank equals 0 when it is found by only one method and 15 when it is found by all of the 16 methods considered.

Classification Accuracy and the Component Specificity
Given a set of the EEG components, their activity can be used for estimating the accuracy of discriminating between the experimental tasks. This estimate can serve as a measure of components' specificity for the performed tasks, allowing the task-specific EEG patterns to be extracted and both artifacts and irrelevant EEG activity to be removed. We used a greedy algorithm to find the task-specific components as follows. Only the dipolar components were considered. At the first step, classification accuracy was estimated for each set containing one or two components. The best set was then selected. The remaining components were added to the set one by one so that the new set would provide the highest classification accuracy. The components from the set for which the classification accuracy was maximal (over all of the steps) were considered as the task-specific components. The classification was tested by 120 trials of cross-validation where 7 blocks were chosen for the training set and the 3 remaining blocks were used as the testing set.
The accuracy of task classification was measured by Cohen's kappa index, κ (Vieira et al., 2010), which was calculated from the elements of the confusion matrix g ij resulting from the cross-validation. The element g ij counts how many times the ith task was recognized by the classifier under the condition that the classified EEG epoch corresponded to the j-th cue. Given g ij , the κ index is where g 0 is the sum of all of the elements of the confusion matrix, i g ii is the sum of all diagonal elements of the confusion matrix, and g j is the sum of all of the elements of the j-th column of the confusion matrix. The index varies from −1 to 1 (perfect classification) and equals 0 in case of random classification.

Component Clustering
The decomposition (3) gives at least as many components as there are electrodes for each of the methods used. Consequently, the number of components computed for all experimental recordings is too high to check the components manually. In order to investigate EEG patterns most typical for our experimental tasks, we used cluster analysis to group the components according to their similarity. The clustering was performed using the Attractor Neural Network with Increasing Activity (ANNIA), which was originally proposed for Boolean factor analysis (Frolov et al., 2007). Adaptation of the technique for cluster analysis is described in Bobrov et al. (2014). When the ANNIA is used for clustering, the strength of synaptic connections between the neurons of the network is determined by the similarity between the corresponding elements rather than by Hebbian learning. The stopping criterion is based on a threshold that specifies the minimal average similarity between the elements of a cluster. In Bobrov et al. (2014), the ANNIA was used to group the components with respect to their topographic map similarity. In this paper, we have also accounted for the component activity similarity using the correlation coefficient between the components' activity power spectral densities estimated for each of the experimental tasks. Spectral analysis was used since the components obtained from different recordings were compared, in contrast with the

Comparison of the Decomposition Methods
The methods were compared according to the criteria described in the previous section. The percentage of dipolar components found by each method is presented in The results indicate that the PWCICA and AMICA methods provide more dipolar components than the other methods. The difference is significant for the AMICA methods and insignificant when PWCICA is compared to other BSS methods. The PCA and CSP methods extracted significantly fewer dipolar components than the other methods. Pairwise Wilcoxon comparison of the MIR values with Benjamini-Hochberg correction (initial significance level: 0.05; resulting critical value: 0.0275) shows that the ICA methods provide significantly higher mutual information reduction than the PCA and CSP methods. The difference between ICA and other methods in terms of MIR was insignificant, as was the difference between the PCA and CSP methods.
The method similarity was estimated by the fraction of shared components according to (17), providing a method similarity  The values indicating insignificant difference are in bold.
FIGURE 1 | The results of multidimensional scaling according to method similarity. The mapping on the left panel is obtained for the case where all the components were considered. The mapping on the right panel is obtained for the case where only the dipolar components were considered. The PCA point is omitted from the left panel due to its remoteness from all other points.
matrix. The similarity matrices for the cases when either all or only dipolar components were considered were used for mapping the method onto a 2D plane with a multidimensional scaling technique. The results of the mapping are shown in Figure 1. Apparently, the methods tend to group according to the decomposition criteria, one group containing the ICA methods except PWCICA, and another group containing the CSP methods. SOBI and PWCICA are distant from both groups, while the PCA is as an outlier.
The results for classification accuracy, obtained after searching for the task-specific components, are shown in Figure 2. Mean κ values are shown for each subject and each method. The methods are sorted according to the κ value averaged over all the subjects. The subjects are sorted according to the κ value averaged over all the methods. The lower diagonal part of Table 2 contains p-values resulting from the pairwise comparison of the κ values for different methods. The values were obtained using Wilcoxon test of the values pooled from all of the participants and sessions with Benjamini-Hochberg correction (initial significance level: 0.05; resulting critical value: 0.320). The CSP methods provide patterns that are significantly better classified than those found by the other methods. The PCA patterns are significantly worse-classified than the patterns found by the other methods. The taskspecific component search yielded accuracy values significantly exceeding those obtained during on-line BCI operation, which is likely due to discarding irrelevant, artifact, and noisy components.

Component Clustering
The clusters obtained using the ANNIA were sorted according to their component occurrence, i.e., the percentage of the experimental session in which the components of the cluster were found. Figure 3 shows topographic maps and activity power spectral densities (PSD) for the components of the first 12 clusters. The maps and PSDs were obtained by averaging over all of the cluster components regardless of the subject or session. Table 3 presents the occurrence, average dipolarity, average rank, and specificity of the components of each cluster. As stated above, the component dipolarity was measured by the residual variance of the topographic map fit with a potential distribution resulting from a single current dipole, and the component rank is the number of methods that found the component minus one. The specificity was calculated as the percentage of cases when the component was included in the best set of components (the set search is described in the Methods section) among the cases when the component was found.
It should be mentioned that AMICA restricted to only Gaussian distributions in mixtures performed as well as AMICA with the generalized super-Gaussian mixtures, suggesting that dipolar components obtained from band-pass filtered EEG signal have activity the distribution of which can be adequately approximated with a mixture of two Gaussians.

DISCUSSION
In the EEG\MEG field, and, particularly, in BCI research, ICA methods are mostly used for artifact removal during signal  preprocessing or epoch classification. They are applied for eliminating ocular (Höller et al., 2013;Dharmaprani et al., 2016;Sarin et al., 2020), motion (Zhou et al., 2016;Kobler et al., 2019), and muscle (Höller et al., 2013;Dharmaprani et al., 2016) artifacts. The most popular method used for artifact suppression is RunICA (extended infomax), likely due to its incorporation into EEGLAB (Delorme and Makeig, 2004) and BrainStorm (Tadel et al., 2011) software. The ICA methods are also believed to provide effective spatial filters for BCI feature selection (Kachenoura et al., 2007;Lotte et al., 2018;Xiaopei et al., 2019). The main limitation to using the methods is that they require a substantial amount of data to converge to meaningful decomposition, and many of the algorithms have high computational costs to implement on-line. These issues, as well as the nature of the algorithm used to automatically select the most specific components, limit the on-line implementation of the procedure proposed in this paper. The procedure is rather intended for assessing the overall task-specificity of the EEG signal and adequately arranging experimental recordings and subjects with respect to the BCI control efficiency. Performing the cluster analysis allows the most frequent and typical EEG patterns in the processed recordings to be extracted with no need to choose a single decomposition method. The best method may differ when different criteria are applied, such as the number of the dipolar components found, the mutual information reduction, or the accuracy of classification of the extracted patterns. The method comparison shows that ICA methods provide results similar to each other despite the difference in the independence criteria underlying the algorithms. These methods find significantly more dipolar components and provide significantly higher MIR than CSP or PCA, with the best results being for AMICA and PWCICA. PCA also has the lowest number of components shared with the other methods, as indicated by the results of multidimensional scaling. The maximal similarity between the PCA and other methods, estimated according to (17), was five times less than the minimal similarity between the other methods when all of the components were considered and two times less when only the dipolar components were considered. On the other hand, our results indicate that nonblind CSP methods find more of the separable activity patterns, much like in Xiaopei et al. (2019), where three ICA methods were compared to CSP in terms of providing the best spatial filters for a BCI classifier. However, unlike our case, no significant difference between RunICA and CSP was observed in Xiaopei et al. (2019). CSP outperformed the ICA methods, likely due to accounting for the signal segmentation with respect to the experimental tasks.
Thus, CSP decomposition is likely to find the dipolar components with task-specific activity, while it is poor at extracting other task-irrelevant dipolar sources. However, finding irrelevant but physiologically plausible sources may be useful for proving that BCI works based on the activity of the targeted brain areas. E.g., it may be important to establish that motor imagery BCI used for post-stroke rehabilitation worked because of the activity of the motor areas and not just due to eye movements or differences in concentration level.
The results of cluster analysis agree with those obtained earlier for fewer healthy subjects and patients with subcortical lesions (Frolov et al., 2017a). In previous work, the components relevant to controlling a hand exoskeleton via BCI were found to correspond to the sources of sensorimotor mu-rhythm located at the bottom of the central sulci of the left (SIL) and right (SIR) hemispheres, alpha-rhythmic activity in the precuneus (Prc), and alpha-and beta-activity in both the supplementary motor area (SMA) and in the premotor cortex of the left hemisphere (PrmL). These sources correspond to clusters 5 (SIL), 6 (SIR), 3 (Prc), 11 (SMA), and 8 (PrmL). We have also found clusters corresponding to blinking (1) and eye movements (7), which sometimes exhibited task-relevant activity, e.g., the subjects often blinked more frequently during relaxation. Clusters 2 and 3 correspond to the occipital alpha rhythm, which has low taskspecificity. Interestingly, the source symmetrical to PrmR was more frequent compared to the earlier results (Frolov et al., 2017a). Source 12 exhibited activity similar to the mu-rhythm, but unfortunately, the subjects for whom it was observed had no individual anatomical MRI scans, so the source could not be reliably localized. However, the SIL source was found for more than half of the recordings for which source 12 was extracted. Also, sources SIL and 12 were found together by the same method in 35% cases. These findings suggest that source 12 might have localization different from that of the SIL sources.
The average PSDs of the clustered component activities reflect the specificity presented in Table 2. The most specific components of the 5-th, 6-th, and 12-th clusters exhibit prominent rhythm desynchronization during the motor imagery. Notably, there is no evident lateralization in the average level of desynchronization between the hemispheres. There are different results on the mu-rhythm behavior in the ipsilateral hemisphere during the hand motor imagery. Our results agree with those obtained in Vasilyev et al. (2016), where ipsilateral rhythm suppression was observed. However, ipsilateral rhythm synchronization was observed in Nam et al. (2011). We suppose that the reason for the result disagreement lies in the BCI protocol. It seems that active motor imagery, when maintained for a long time (from 10 s in these experiments to 20 s in our earlier NIRS study (Bobrov et al., 2016), involves activation of both hemispheres, either due to high concentration and the complexity of the task or due to the person's intention to "check" whether the requested hand movement has been imagined. The first explanation is supported by the results of Nam et al. (2011), where the ipsilateral rhythm synchronization was observed mainly in the end phase of brief motor imagery (1 s) trials, suggesting that brief and prolonged motor imagery may require different strategies. The latter explanation comes from some of the participants reporting that they had sometimes switched attention to the other hand to "make sure it isn't moving". Note, that both phenomena (different strategy and attention switch) may be present during prolonged motor imagery and underlie the absence of the asymmetry of rhythm suppression.
Unlike the SIL, SIR, and 12th sources, other sources do not exhibit such prominent rhythm desynchronization or synchronization during the motor imagery. However, the Prm, SMA, and Prc sources are often marked as being task-specific. These areas are often reported to be relevant to motor planning, execution, and imagery. The Prm sources are likely to reflect mirror neuron system activation when the image of the motion to be executed or imagined is generated to be compared to the incoming sensory information (Rizzolatti et al., 2014). The SMA source is likely to correspond to the supplementary motor area activation. This area is considered to be involved in the motion timing, motion sequence generation, or motion suppression (Rizzolatti et al., 2014). It was reported to be more active during the motor imagery than during real motion execution, suggesting that its main role is to suppress the actual movement when motor imagery is performed (Guillot et al., 2014). The occurrence of the Prm and SMA sources is not as high as that of the Prc and SI sources. Nevertheless, we believe that these areas are activated in all of the participants but that the activations are not often prominent in the electroencephalogram and thus are harder to find, even when using advanced techniques. This supposition is supported by the results of numerous fMRI studies (Hétu et al., 2013) that report up to 34 areas being active during motor imagery.
In conclusion, we suggest the use of multi-method decomposition with subsequent component specificity estimation and clustering, focusing on the shared components. The primary methods to be used are PWCICA, AMICA, and MCSP, but utilizing other algorithms to support the findings seems to be reasonable.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available upon request by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethics committee of Institute of Higher Nervous Activity and Neurophysiology of Russian Academy of Sciences. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
MI, YK, DB, and AL performed the experiments and collected the data. DB and AL designed the experimental software. PB, MI, and YK analyzed EEG data. AF and PB contributed to the interpretation of the data. AF, PB, and EB contributed to writing the manuscript. All authors approved the final version of the manuscript for submission.

FUNDING
This work was supported by the Russian Ministry of Education and Science, grant RFMEFI60519X0184.