Evolutionary Algorithm Based Feature Optimization for Multi-Channel EEG Classification

The most BCI systems that rely on EEG signals employ Fourier based methods for time-frequency decomposition for feature extraction. The band-limited multiple Fourier linear combiner is well-suited for such band-limited signals due to its real-time applicability. Despite the improved performance of these techniques in two channel settings, its application in multiple-channel EEG is not straightforward and challenging. As more channels are available, a spatial filter will be required to eliminate the noise and preserve the required useful information. Moreover, multiple-channel EEG also adds the high dimensionality to the frequency feature space. Feature selection will be required to stabilize the performance of the classifier. In this paper, we develop a new method based on Evolutionary Algorithm (EA) to solve these two problems simultaneously. The real-valued EA encodes both the spatial filter estimates and the feature selection into its solution and optimizes it with respect to the classification error. Three Fourier based designs are tested in this paper. Our results show that the combination of Fourier based method with covariance matrix adaptation evolution strategy (CMA-ES) has the best overall performance.


COMMON SPATIAL FILTER (CSP)
The common spatial filter (CSP) that proposed in (Samek et al., 2012;Lotte and Guan, 2011) is an optimization process that tries to find the coefficients m that maximize the following cost function. In this work, the Tikhonov regularized CSP (TRCSP) is employed. The cost function of TRCSP is defined by: where C i is the variance of EEG signal in ith condition, αm T Im is the penalty term and α is selected such that the performance of a LDA classifier is maximized with a 10-fold cross validation. The optimization of equation (S1) is done by solving the following Lagrange equation The solution of equation (S2) is equivalent to the eigenvalue problem that is given by (C 2 + αI) −1 C 1 m = λm. The spatial filter m that maximizes the equation (S2) are the eigenvectors of (C 2 + αI) −1 C 1 .

BMFLC-KF
The BMFLC is a signal model that can be used to estimate the unknown amplitude in a pre-defined frequency band. The informative motor modulation of EEG signal, i.e. ERD, can be observed in α band. The previously developed BMFLC divides a pre-defined frequency band [ω 1 , · · · , ω n ] into n equally distributed divisions with frequency spacing ∆ f , and estimates the amplitude of each frequency component by using least-mean-square algorithm(LMS) (Veluvolu et al., 2012;Wang et al., 2012a). Although the estimation accuracy is high, the LMS algorithm can not guarantee an accurate and fast tracking of amplitude variation. In order to improve the tracking performance of the existing BMFLC, we later developed Kalman filter based BMFLC (BMFLC-KF) (Wang et al., 2012b) was later developed.
The state-space equation of BMFLC is given as: where x k and w k are defined as: With the assumptions that v k and η k are independent Gaussian process with 0 mean and covariance of R and Q respectively. The Kalman filter iteration is given as denotes the mathematical expectation of w at time instant k with respect to previous observation y at k − 1, P k is the estimated state error covariance and K k is the Kalman gain. The Kalman filter starts with the initial conditionŵ 0 = 0 and P 0 = I. The weight vectors of BMFLC represents the Fourier coefficients of the band-limited signal.

GLOBAL AND LOCAL REAL-CODED GENETIC ALGORITHMS
As the function that needs to be optimized is unknown, the performance of GA depends on the ability of exploring the function space and then fine tuning the solution in a selected promising area. GLGA uses several operators to handle real-valued solution vector and also balances the global and local searching (García-Martínez et al., 2008).
For handling real-valued solution vector, the parent-centric BLX-α (PBX-α) crossover operator is used. The PBX-α takes two real-valued solution vectors denoted as X 1 = (x 1 , . . . , x n ) and Y 1 = (y 1 , . . . , y n ), . . , n. The offspring Z = (z 1 , . . . , z n ) is generated by the following rule: where The offspring z i is randomly picked up within the range of l i and u i . The parameter α ∈ [0.5, 1] is used to control the spreading of generated solution.
In the above stated crossover operation, we denote X as female parent and Y as the male parent. As the generated offspring will be near to the female parent, it is therefore considered as an anchor point in the searching process. The diversity of generated solution is governed by the selection of male parent. As in (Deb et al., 2002), this separation of solution also has the function of mutation operator. The next task is to select proper parent.
For choosing the female parent, the uniform fertility selection (UFS) is employed. To apply UFS, first we construct female parent pool by taking N f number elite solutions and then track number of times each solution is selected from the pool. The less used solution in the female parent pool is chosen as the female parent. UFS focuses on the diversity of the solution generation. As each possible female parent is located at the promising areas, UFS confirms to search these areas throughly.
The negative associative mating (NAS) is used to choose the male parent. The male parent pool consists N m number of solutions. First, the roulette wheel method is applied to the selected 5 candidate male parents. The Euclidean distance between the selected female parent and each of the possible male parent is calculated. The one with the highest distance is selected as the male parent. NAS also tries to increase the diversity in the offspring.
For balancing the global and local search of GLGA, the female and male differentiation (FMD) process is employed. FMD determines the sexuality of the solution in the current population. This procedure is conducted before UFS and NAS. FMD determines the number of solution in each parent pool, e.g N f and N m with N f ≤ N and N m ≤ N , with either one of them equal to the total number of solutions in the current population N . By choosing different configuration of N f and N m , the GLGA focus on global or local search. As suggested in (García-Martínez et al., 2008), N m = N and N f = N/2 is used for global searching and N m = 100 and N f = 5 is used for local searching. The rational of this parameter setting is clear from the construction of the algorithm. In the global search phase, the more number of female parents are used to ensure the algorithm covers as much area as possible and the high number of male parents ensure the exploitation. After entering into the local search, the decreased number of female parents would help the algorithm to focus more on the promising area. GLGA is not operated by generation, rather it uses function evaluations as the criterion to shift from global to local search. GLGA-n% indicates that n% of function evaluations are used for global search and rest are used for local search.
To keep the number of solutions in the population consistent, the "replace the worst strategy" is adopted. Each time when an offspring is generated, the fitness value of the offspring is compared to the solution in the current population. If it is better than any of the solution in the current population, the worst one is replaced.

CMA-ES
The (µ, λ) CMA-ES requires λ number of solutions in each generation in order to select the best µ number of solutions for updating the value in next step. The CMA-ES algorithm updates the solution by sampling from a Gaussian distribution with mean m, step-size δ and covariance estimation C as follows: where g and k are indices for generation and solution respectively, s k is the kth solution in gth generation. To find the solution in the next generation, CMA-ES estimates the mean, step-size and covariance by using the solution in the current generation and also considers the history path of the solution evolution. The mean m (g+1) update is given by: After each generation, the solutions are ranked based on their fitness value. The update of mean is carried out by a weighted summation of the µ best solutions in the current generation. Each weight is user defined and subject to the condition µ k=1 w k = 1. The step-size δ is the trade-off between exploration and exploitation of the CMA-ES. Its update requires the evolution history in solution space to calculate evolution path p δ . The update rule is given by: where p δ is the length of evolution path calculated up to current generation, E N (0, I) is the length of a normal distribution with the same dimensionality of the solution vector and β is a tunable parameter. The expectation is that the solution evolution path denoted by p δ should distribute similar to a random walk that is characterized by a normal distribution. If the solutions in the consecutive generation tend to move to a similar direction, a bigger δ is required to explore more area in the solution space. Whereas, if the solutions tend to be in contrary direction, a smaller δ is used to search in a finer grain.
Finally, the covariance matrix C needs to be updated. The value of C is determined by the covariance of the estimated evolution path denoted by p c and the covariance estimated from the µ number of solutions in the current generation. The update rule is given as C (g+1) = (1 − γ 1 − γ 2 )C (g) + γ 1 Cov(p c ) + γ 2 Cov(s 1:µ ) (S14) where Cov(•) is the covariance estimation of corresponding variable, γ 1 and γ 2 are tunable parameters.
In (Hansen et al., 2009), Cov(p c ) and Cov(s 1:µ ) are called rank-one and rank-µ update respectively. The former term captures the historical information of the solution evolution whereas the latter term contains the solution distribution in the most recent generation. The CMA-ES starts by setting initial value to s (0) , δ (0) , C (0) and updating the solution according to Eq.(S11) to Eq.(S14). The algorithm terminates when certain stopping criterion has been met.