Kurtosis-Based Blind Source Extraction of Complex Non-Circular Signals with Application in EEG Artifact Removal in Real-Time

A new class of complex domain blind source extraction algorithms suitable for the extraction of both circular and non-circular complex signals is proposed. This is achieved through sequential extraction based on the degree of kurtosis and in the presence of non-circular measurement noise. The existence and uniqueness analysis of the solution is followed by a study of fast converging variants of the algorithm. The performance is first assessed through simulations on well understood benchmark signals, followed by a case study on real-time artifact removal from EEG signals, verified using both qualitative and quantitative metrics. The results illustrate the power of the proposed approach in real-time blind extraction of general complex-valued sources.


IntroductIon
The aim of blind source separation (BSS) is to reconstruct the original sources by identifying the inverse of the mixing system, without having explicit knowledge of the mixing parameters or sources (Cichocki and Amari, 2002), and has found application in diverse areas including biomedical engineering, communications, sonar, and radar (Cichocki and Amari, 2002;Anemüller et al., 2003). Standard BSS methods use cost functions based on second-and higher-order statistics, together with the maximization of likelihood and entropy (Bell and Sejnowski, 1995;Amari et al., 1997;Hyvärinen et al., 2001). In addition, to facilitate the modeling of real-world systems, noisy environments, and post-non-linear mixtures have been recently studied in real domain algorithms (Särelä and Valpola, 2005;. Within the BSS methodology, the latent sources are separated in a random order through either a deflationary or symmetric orthogonalization procedure, that is, one by one or simultaneously. A class of BSS algorithms, termed blind source extraction (BSE), aims to retrieve the sources one by one, based on a fundamental signal property (non-linearity, sparsity), effectively inducing an order into the separation process. The benefit of BSE becomes apparent in large-scale problems where only a small subset of the sources are of interest, making it possible to extract such sources at a dramatically reduced computational complexity and in real-time. This also relaxes the requirement for pre-or post-processing of the mixture or separated sources, that may be necessary if parallel BSS techniques were employed.
Real domain algorithms performing BSE based on the temporal structure (predictability) of signals are well established (Barros and Cichocki, 2001;Cichocki and Amari, 2002;Mandic and Cichocki, Kurtosis-based blind source extraction of complex non-circular signals with application in EEG artifact removal in real-time applications in fMRI modeling (Novey and Adalı, 2008) and communications (Douglas et al., 2006;Ollila and Koivunen, 2009b). In comparison to standard complex BSS methodology (Bingham and Hyvärinen, 2000;Anemüller et al., 2003), which assumes complex circular sources, these algorithms have been shown to exhibit enhanced performance for non-circular sources and similar performance for circular sources. In the same spirit, the feasibility of BSE of complex sources based on the temporal structure of the latent sources was studied in (Javidi et al., 2009), exploring a widely linear predictor to extract both proper and improper sources. A class of linear predictability based algorithms for blind extraction from noisy complexvalued mixtures has also been recently proposed (Javidi et al., 2010).
In this paper, we introduce an online BSE algorithm suitable for the generality of complex-valued signals, both circular and noncircular. This is achieved based on higher-order statistics of latent sources, and using the deflation approach. Further, the cost function based on an extension of the methodology presented in (Liu and Mandic, 2006) is designed so as to be robust to both circular and non-circular second-order additive noise. The analysis is supported by benchmark simulations in both noise-free and noisy scenarios, followed by studies of conditioning of EEG signals for the automatic removal of biological and power line artifacts.
The paper is organized as follows. Section 2 provides an overview of complex statistics, complex-valued noise, and CR calculus. The cost function for both noise-free and noisy cases is then presented, together with the derivation and convergence analysis of a real-time adaptive BSE algorithm. In Section 3, after analyzing the performance in blind extraction of synthetic sources, EEG signal conditioning for brain computer interfacing is studied. Conclusions are presented in Section 5.

coMplex statIstIcs: second-order cIrcularIty
Second-order circularity is a property of probability density functions (pdf), whereby the distribution of a complex random variable z and its rotation e w z are equal for any angle w (Picinbono, 1994). Within the domain of second-order statistics, to account for complex random variables with non-circular pdf, we need to use both the covariance C zz and pseudo-covariance P zz matrices (Picinbono and Bondon, 1997) C P zz zz For second-order circular (proper) random variables, the pseudo-covariance matrix vanishes, that is P zz = 0, whereas for second-order non-circular (improper) random variables the In this case, the power spectrum is flat across all frequencies, while the pseudo-spectrum is non-zero. As the noise becomes more noncircular (r → 1), the pseudo-spectrum approaches its upper-bound defined in (5), where for highly non-circular noise (r ≈ 1), the magnitudes of the pPSD and PSD are similar. For a scalar complex white noise signal v(k), the relations between the correlation and pseudo-correlation and the respective spectra are given by Examples of circular white Gaussian and Laplacian noise with unit variance are illustrated in the left hand column of Figure 1A, whereas the right hand column demonstrates two examples of non-circular white noise, with the top-right plot showing a noncircular Gaussian noise signal with circularity measure r = 0.81 with unit variance and pseudo-variance t v 2 0 38 0 71 = − + .
. ,  and the bottom-right plot illustrating the scatter plot of non-circular Laplacian noise with circularity measure r = 0.81 with unit variance and pseudo-variance of 0.45 − 0.66. Also note that in Figure 1A the value of the kurtosis is approximately zero for both the circular and non-circular Gaussian noise signals, whereas the kurtosis values for the circular and non-circular super-Gaussian noise signals follow the real-valued convention. Figure 1B depicts the PSD and pPSD of circular (r = 0) white and non-circular doubly white Gaussian noise for the respective circularity measures r = {0.8,1}. Observe that the pseudo-spectrum is zero for the circular noise, while it has a magnitude of 0.64 for the noise with r = 0.8, and reaches it upper-bound of 1 in the third realization where the noise is highly non-circular (r = 1). For the Gaussian noise, the spectrum S(ω) = 1 and the pseudo-spectrum across all frequencies, thus indicating that by increasing the eccentricity of the ellipse (degree of non-circularity), the magnitude of the pPSD approaches its maximum value of 1.

CR calculus: BrIef overvIew
The CR calculus 1 (Kreutz-Delgado, 2006) allows for the analysis of functions that do not normally satisfy the stringent Cauchy-Riemann conditions of analyticity, such as real-valued cost functions of complex variables. Consider a typical cost function F(z): C N |→ R, a real function of complex variables, which does not satisfy the Cauchy-Riemann properties, required for gradient calculations. However, using the CR calculus framework, it is possible to calculate the gradients of such functions directly in C, and without the need to obtain derivatives of the real and imaginary components separately.
In the framework of CR calculus, F is taken as a function of the complex input vector z and its conjugate z * , collectively termed the conjugate coordinates, that is Note that although z and z * are not statistically independent, this does not affect the calculation of derivatives, defined as (Kreutz-Delgado, 2006) The first term in (6) is the normalized fourth order moment, the second term is the square of the circularity coefficient (Ollila, 2008), whereas kurt c (z) in (7) is the real-valued kurtosis of the complex random variable z. Similar to the kurtosis of a real-valued Gaussian random variable, the value of K c is zero for both circular and noncircular complex Gaussian random variables. Furthermore, for continuity, this measure makes kurtosis values of a sub-Gaussian complex random variable negative and that of a super-Gaussian complex random variable positive, irrespective of the degree of circularity/non-circularity (Douglas, 2007).

coMplex-valued noIse
The degree of non-circularity can be quantified by the circularity measure r, defined in (Ollila, 2008) as the magnitude of the circularity quotient r t s measures the degree of non-circularity in the complex signal, with t z 2 the pseudo-variance of the signal and the circularity angle u = arg(r(z)) indicating orientation of the distribution. Note that for a purely circular signal, r = 0, with u not providing additional information about the distribution. This circularity measure can also be graphically interpreted using an ellipse (centered in the complex plane) of eccentricity e and orientation a, such that r = e 2 and u = 2a (Ollila, 2008, Theorem 1). For e = 0, the shape becomes a circle, which also indicates a circular signal with r = 0, while for the extreme case of e = 1, corresponding to a highly non-circular signal with r = 1, the ellipse becomes elongated with a maximal major axis and minor axis of length zero. Note that the pseudo-variance of a general complex Gaussian distribution is then related to the elliptic shape by τ 2 = e 2 e 2u (Ollila and Koivunen, 2009b).
It is important to notice that the treatment of a noise vector v(k) in C is different to that in the real domain (Picinbono and Bondon, 1997). While in R only the variance s v 2 of the noise signal is of concern, in C it is necessary to also consider the pseudo-variance t v 2 , in order to completely model the noise. We therefore differentiate between the following cases of white noise.
1. Circular white noise, is considered white in terms of its diagonal covariance matrix, whereas the pseudo-covariance matrix vanishes, that is where I denotes the identity matrix.
In the frequency domain, the covariance spectrum S v (ω) (also power spectrum, or PSD) of the circular white noise is flat, while the pseudo-covariance spectrum  (Brandwood, 1983;Kreutz-Delgado, 2006). The operators R{·} and I{·}, where used, denote the real and imaginary part of a complex quantity, while 〈·,·〉 is the inner product operator.
Furthermore, in calculating derivatives of analytic functions, as defined, the R * -derivative vanishes and the derivative is equivalent to the Cauchy-Riemann derivative, demonstrating the flexibility of the framework. This can be illustrated through a simple example. Consider the non-analytic squared error cost function G= = || || . Then, ∂G/∂z = z * and ∂G/∂z * = z. In contrast, for the analytic function H(z) = z 2 , ∂H/∂z = 2z and ∂H/∂z * = 0. For further insight into CR calculus, we refer to the material in (Kreutz-Delgado, 2006;Mandic and Goh, 2009).

Bse of coMplex noIsy MIxtures
The diagram in Figure 2 shows the complex BSE architecture, where at time instant k, the observed signal x(k) ∈ C N is given by a linear mixture ∈C is the vector of latent sources, A ∈ C N × N , is the mixing matrix 2 , and v(k) ∈ C N is the vector of additive doubly white Gaussian noise (non-circular). The model (11) has been widely used in EEG signal processing, for instance see (Cichocki and Amari, 2002;Sanei and Chambers, 2007). The sources are assumed to be with zero mean and distinct kurtosis values, while no assumptions are made about the circularity. The number of mixtures is assumed to be equal to that of the sources, however, in the case of noisy mixtures, an overdetermined mixture is necessary so as to estimate the second-order statistics of noise parameters.
The adaptive gradient descent algorithm at the extraction stage adapts the parameters of the demixing vector w such that the source signal with the largest (smallest) kurtosis, is first extracted. The variance of y(k) can be written in an expanded form as Also note that the direction of steepest descent is given by the derivative with respect to z * , the R * -derivative. This can be shown by using the first order Taylor series expansion (TSE) of F (van den Bos, 1994); the magnitude of a small change in the function F is given by and the Cauchy-Schwarz Inequality shows that  If a mixing process is considered for the noise given by B, the vector w H in the subsequent equations can be replaced by υ H = w H B. This does not affect our algorithm, for the normalized kurtosis of Gaussian noise is unconditionally zero.

Frontiers in Neuroscience | Neuroprosthetics
October 2011 | Volume 5 | Article 105 | 4 In the cost function (17), the parameter b dictates the order of extraction where for 1. b = 1, the order of extraction is from the high to low degree of non-Gaussianity (super-Gaussian sources are extracted first), 2. b = −1, the order of extraction is from low to high degree of non-Gaussianity (sub-Gaussian sources are extracted first).
The optimization of J with respect to w can thus be stated as where the norm of the demixing vector is constrained to unity to avoid very small coefficient values.
Rewriting and simplifying (17) in terms of (13) and (16) results in ≤ and is equal to unity only if one of the components in the vector u is non-zero. Given the constraint on |ú|, the solution to the optimization of (19) is a vector ú opt of unit norm such that u opt has a single non-zero component at a position corresponding to the diagonal element in K c (s) having the largest magnitude. For this to be valid, a demixing vector assumes the form w opt = A H #u opt , where the symbol (·)# denotes the matrix pseudo-inverse operator (Liu and Mandic, 2006).

adaptIve algorIthM for extractIon
Optimization of (17) is performed using an adaptive gradient descent algorithm which updates the values of w so as to maximize the modified normalized kurtosis and thus minimize the cost function J(w). Based on Section 2.4, the gradient 3 is thus expressed as where a ∈ [0,1] is the forgetting factor.
where the difference in C ss (0) are absorbed into the mixing matrix A and the noise covariance matrix C vv I ( ) 0 2 = s v (due to the whiteness assumption).
In the same spirit, the normalized kurtosis of the extracted signal y(k) can be written as thus having zero value for Gaussian noise. In a vectorized form, this is equivalent to The next stage within the proposed BSE scheme is the deflation process which aims to remove the extracted source y(k) from the mixture x(k), such that where the deflation weight coefficient vector  w is updated using an adaptive gradient descent algorithm detailed later in this section. In principle, for y(k) being an estimate of one of the original sources, say s n (k), the ideal deflation weight vector should be equal to the nth column of the mixing matrix A, such that the effect of this particular source is removed from the mixture. Finally, a threshold can be set on the deflation process, so that extraction is continued until some or all the required sources have been successfully extracted (Thawonmas et al., 1998).

cost functIon
The cost function we employed for the extraction of general complex sources from noisy mixtures is given by Note that J ∈ R, represents a modified version of the normalized kurtosis defined in (6) and is a generalization of the methodology presented in (Liu and Mandic, 2006). As illustrated in (13), the variance of y(k) contains the noise variance s v 2 , thus allowing us to remove the effect of noise from (17) such that only contributions from the latent sources are accounted for. Also note that while the noise variance s v 2 is present in the cost function (17), its pseudo-covariance t v 2 is not present, suggesting that the complex domain BSE based on kurtosis is unaffected by the pseudo-spectral effects of the additive noise; this is further elaborated in Section 3. 3 Since the normalized kurtosis J is real-valued, the conjugate gradient ∂ ∂ J w * corresponds to the maximum change of the gradient.

www.frontiersin.org
October 2011 | Volume 5 | Article 105 | 5 where e(k) is an adaptive regularization parameter. The gradient adaptive regularization parameter is then given by The kurtosis-based BSE update algorithm (K-cBSE) for the demixing vector is thus given by and its expanded version is given in (22), where m is the small positive step-size. To preserve the unit norm property, the demixing vector is normalized at each iteration, that is Notice that in extracting circular sources, the moment p l vanishes, further simplifying the algorithm. Moreover, as mentioned earlier, the cost function and thus the gradient descent algorithm are not dependent on the pseudo-variance of the noise, t v 2 . The estimation of the noise variance can be performed using a subspace method, as described in (Hayes, 1996). It is thus essential that the number of observations is larger than the number of sources, N > N s , so as to allow for the estimation of the noise variance via eigenvalues of the observation covariance matrix C xx , that is The subspace method can be briefly summarized as follows. We can assume Rank(Υ) = N s if A is of full rank and C ss is non-singular. Then, the (N − N s ) eigenvalues of Υ are zero and hence the (N − N s ) eigenvalues of C xx are equal to s v 2 .

ModIfIcatIons to the update algorIthM
In order to enhance the performance of the online gradient descent algorithm, adaptive step-size update algorithms are considered. We consider the complex-valued Farhang-Ang type variable stepsize (VSS) algorithm (Ang and Farhang-Boroujeny, 2001) and the complex-valued generalized normalized gradient descent (GNGD) type algorithm (Mandic, 2004). At each iteration k, the VSS algorithm minimizes the cost function J in (17) with respect to m(k − 1) to provide the update of the step-size, given as and h and g are step-sizes. The GNGD-type algorithm is based on a normalized version of (22), given by where r is a step-size. The derivation of the algorithm is given in the Appendix.

adaptIve algorIthM for deflatIon
The deflation procedure insures that after each extraction stage, the estimated source is removed from all the mixture vectors, so that the next source with maximum (minimum) kurtosis can be extracted. This can be achieved based on the cost function (Thawonmas et al., 1998) which is minimized with respect to the deflation weight coefficient  w. We use x n (k) to denote the mixture at the nth extraction stage, which is given by vectors Given an invertible mixing matrix A, the vector  w is ideally equal to a column of A −1 , which corresponds to the nth extracted source y n (k). The gradient can thus be calculated as and the online algorithm for BSE then becomes with m d a step-size. The drawback of this method is that any errors in the deflation process will propagate and affect the extraction and deflation of subsequent stages. It is therefore important that the step-size parameter is set appropriately for each nth deflation stage to ensure successful removal of the extracted source y n (k).
In the design of complex adaptive algorithms, it is common to utilize a widely linear model to ensure that the algorithm is capable of processing the generality of complex signals (Mandic and Goh, 2009). In the case of the update for the deflation weight coefficient (30), however, a linear model is considered as the original BSS mixing model (11) is strictly linear and thus a widely linear deflation model is not required. For more detail on BSE based on widely linear predictability, see (Javidi et al., 2010).

results and dIscussIons
We shall consider extraction of both synthetic and real-world sources from noise-free and noisy mixtures, with various degrees of complex non-circular noise levels. The performance for the synthetic data were measured using the performance index (PI; Cichocki and Amari, 2002) given by In the first experiment, the performance of the algorithm (22) using the adaptive step-size methods was compared in the extraction of the first source with the value of m set to 0.01. It can be seen from the performance curves in Figure 4 that the best performance was achieved using the GNGD method with a PI of around −45 dB at the steady-state. The performance curve resulting from the normalized method indicates successful extraction with a PI of around −25 dB. The performance of the algorithm using the standard step-size and VSS were comparable, with a PI of around −20 dB. In the following simulations, the GNGD based K-cBSE algorithm is utilized.
In the next set of simulations, we considered the extraction of all the three sources (Figure 3). The value of m was set respectively to 0.01, 0.008, and 10 −5 for the consecutive extraction stages. As shown in Figure 5, the algorithm successfully extracted all the three sources, as indicated by a PI of less than −20 dB at the steady-state for the extraction iteration i = {1,2,3}, converging to steady-state after 2500 samples in the first extraction stage (i = 1) and around 1000 samples in the second and third extraction stage (i = {2,3}). The decreasing PI value at each consecutive extraction stage can be attributed to the unavoidable errors accumulated in the deflation. The scatter plot of the three estimated sources y 1 (k), y 2 (k), and y 3 (k) are illustrated in Figure 3. The normalized kurtosis of the estimated sources were respectively calculated as K c (y 1 ) = 11.8425, K c (y 2 ) = 1.3599, and K c (y 3 ) = −1.9956 corresponding to those of the original sources, given in Table 1; the scale and rotation ambiguities of the source estimates are also visible.

BenchMarK sIMulatIon 2: noIsy MIxture
In the next experiment, we considered the extraction of a complexvalued source from a noisy mixture. Three sources of N = 5000 samples were considered (see Table 2; Figure 6) and mixed using a randomly generated 4 × 3 mixing matrix A. The additive noise was doubly white Gaussian noise with variance s v 2 0 1 = . For each synthetic experiment, the results were produced through averaging 100 independent trials. The mixing matrix A was generated randomly as a full rank complex matrix. The values of the extraction and deflation step-size m were set empirically, and the forgetting factor a in (21) was set as 0.975. The complex additive Gaussian noise was both of circular white with circularity measure r = 0 and non-circular doubly white with r = 0.93. The real-world sources were the electroencephalogram data corrupted by power line noise and electrooculogram (EOG) artifacts.

BenchMarK sIMulatIon 1: synthetIc sources
In the first set of simulations, a noise-free mixture of three complex sources with various degrees of circularity and N = 5000 samples were generated and mixed using a 3 × 3 mixing matrix. These signals are illustrated in Figure 3 and their properties listed in Table 1. Extraction was performed in order from highest to lowest kurtosis, hence the value of b = 1 in (17).

FiGurE 3 | Scatter plot of the complex-valued sources s 1 (k), s 2 (k), and s 3 (k), with the signal properties described in Table 1 (left hand column).
Scatter plot of estimated sources y 1 (k), y 2 (k), and y 3 (k), extracted according to a decreasing order of kurtosis (b = 1) (right hand column). circularity. In addition, the maximum effective range of the algorithm in extracting sources (PI < −20 dB) can be estimated as an SNR of 1 dB.

eeg artIfact extractIon
In order to obtain useful information from EEG data in realtime, it is often necessary to perform post-processing to remove artifacts such as line noise and biological artifacts including those pertaining to eye movement, captured in the form of EOG and facial muscle activity represented as electromyogram (EMG).
Removal of the effect of such signals from the contaminated EEG and pseudo-variance t v 2 0 0924 0 0011 = + .
. ,  estimated using the subspace method described in Section 2.5. The sources were extracted in an increasing order of kurtosis (b = −1) with the stepsize m = 0.5. The scatter plot of the first estimated source with the smallest kurtosis, y 1 (k) is illustrated in Figure 6 with a calculated normalized kurtosis of K c (y 1 ) = −1.8002, which is within a 10% range of the true value, given in Table 2. The PI, shown in Figure 7, demonstrates a fast convergence to a value of around −40 dB in approximately 1000 samples, and continuing a steady convergence to −50 dB by 5000 samples.
It was shown in Section 2.5 that the performance of the algorithm (22) is not affected by the degree of circularity of the additive noise, such that doubly white noise is treated in a similar manner to circular white noise, where the pseudo-covariance vanishes. This was explored experimentally by systematically analyzing the effect of various noise levels on the BSE algorithm (22). The circularity measure r was varied from a value of r = 0 (circular) to a value of r = 0.9998 (highly non-circular), while the signal-to-noise ratio (SNR) was adjusted from a near-zero noise SNR of 50 dB to a high noise environment with SNR value of −10 dB. The initial values were generated randomly and PI was averaged over 100 trials. Figure 8 illustrates the performance curve for the different variations in the noise properties, and confirms that while the performance is dependent on the SNR value, it does not vary with changes in the degree of noise non-    Table 2.

Frontiers in Neuroscience | Neuroprosthetics
October 2011 | Volume 5 | Article 105 | 8 highly negative kurtosis value. This has been used as the main discrimination in defining classifications based on the fourth order moment.

Data acquisition and method
Our aim is to remove artifacts as independent sources extracted from the recorded EEG mixture directly in the time domain. To this end, the contaminated EEG signals were paired as the real and imaginary components of a complex signal and processed using the architecture described in Section 2.5. In this manner, the phase-amplitude relationship and the full cross-statistical information between symmetric electrode pairs are contained in the corresponding complex-valued EEG signal, and allow for the simultaneous processing of both channels. Further iterations of the extraction process can then be used to obtain the individual pure EEG signals, or even, pipelined to a further post-processing stage, which would then extract the EEG signals based on a desired fundamental property, such as predictability.
The electrodes were placed according to the 10-20 system (Figure 9), and sampled at 256 Hz for 30 s. The EEG activity was recorded from electrodes placed at positions Fp1, Fp2, C3, C4, O1, O2 with the ground placed at Cz, while the EOG activity was recorded from the vEOG and hEOG channels with electrodes placed above and on the side of the left eye socket.
Three studies were performed with the aim to remove the artifacts simultaneously. While the rejection of the power line noise artifact is feasible by passing the recorded EEG signals through a notch filter, this solution also leads to the removal of useful information around the 50-Hz range pertaining to the EEG signals, in particular those within the gamma band (25-100 Hz). It would therefore be desirable to automatically extract the line noise artifact along with the biological artifact from the corrupted EEG has been subject of study in previous years, with several methodologies introduced that attempt to accomplish this utilizing both online and offline algorithms (Vigário, 1997;Jung et al., 2000;Delorme et al., 2001Delorme et al., , 2007Barbati et al., 2004;Greco et al., 2005;Kumar et al., 2009). While offline algorithms are suitable for processing the recorded EEG data in clinical applications, it is necessary to utilize online algorithms for real-time applications such as those encountered in brain computer interface (BCI) scenarios.
In (Kumar et al., 2009) the authors propose an online algorithm whereby the recorded EEG signals are transformed to the wavelet domain and the EOG contaminants are removed using an adaptive recursive least squares (RLS) algorithm, before transforming the signal back to the time domain. Simulations demonstrate good performance from the algorithm, however, it would be advantageous to perform all the necessary processing in the time domain, as this way the signals are retained in their original form and less computation is required.
In its basic form, ICA can be applied to the contaminated EEG recording and the artifacts removed through visual inspection. As detailed in (Vigário, 1997), an ICA algorithm separates the recorded EEG mixture into its original sources as independent components (ICs), with artifact sources identified and removed. In semi-automatic (Greco et al., 2005) and automatic (Delorme et al., 2001) artifact removal methodologies, several classifications (markers) based on the statistical characteristics of the ICs are considered that allow for the detection of artifacts in the contaminated EEG, which are then compared against thresholds that determine the rejection of particular components. In these methods, both the kurtosis and entropy of ICs have been utilized to identify and remove the artifacts. While the EEG mixtures typically have near-zero kurtosis values, artifacts such as EOG exhibit peaky distributions with highly positive kurtosis values (Delorme et al., 2001), while periodic power line noise has a

www.frontiersin.org
Transform, the H-H transform results in much more detailed spectrogram for a given resolution. The intrinsic mode functions (IMFs) required by the H-H transform were obtained using a multivariate empirical mode decomposition (MEMD) algorithm (Rehman and Mandic, 2010), where the real and imaginary component of the complex-valued signals were taken as a single multivariate signal and processed simultaneously. It was observed that this resulted in a spectrogram with better resolution than those obtained through the separate processing of the individual components using the standard EMD algorithm. (b) Power Spectral Distribution: The power and pseudo-power spectra of the complex-valued extracted artifacts were compared to those belonging to the complex-valued recorded artifact.
In addition, the pseudo-spectrum demonstrates the quality of the proposed method in extracting non-circular sources. It is also possible to consider the cross-spectrum of the recorded and extracted sources (Palmer et al., 2009).

Case Study 1 -EOG extraction
The "Eyeblink" dataset contained the EEG recordings contaminated with eye blink artifact as well as line noise. The recorded EEG and EOG signals are plotted in Figure 10A, where the effect of the EOG activity is pronounced in the frontal lobe (Fp1 and Fp2 channels), with the effect diminishing with an increase in the distance of the electrodes to the eyes. The effect of the line noise is also visible on the occipital O1 and O2 channels. The H-H T-F spectrogram ( Figure 10B) describes the frequency changes of the ensemble average of the six EEG channels over the recording period. In correspondence with the time plot, the EOG artifacts are visible (with a duration of around 1 s); constant frequency components are seen around the 50-Hz range due to the line noise. Note that due to the low sampling rate of the recording device, the 50-Hz frequency component is not well defined in the T-F analysis and results in scattering of frequency components between 40 and 60 Hz. The complex EEG signals formed using (32) were processed using the K-cBSE algorithm with the value of m = {5,0.09} and b = {−1,1} for the consecutive iterations and a = 0.975. The choice of value for b ensures that the line noise is initially extracted, followed by the EOG components in the second iteration. The normalized kurtosis values of the original real-valued EEG signals and the extracted EEG signals are given in Tables  3 and 4.
The order of the extracted complex signals were as expected, with the first extracted source y 1 (k) (line noise) being sub-Gaussian and y 2 (k) (EOG) super-Gaussian. The imaginary component of y 1 (k) had the smallest kurtosis, and was automatically chosen as the extracted line noise source, while the near-zero kurtosis of the real component R{y 1 (k)} indicates an EEG source. Also, both components of the second extracted source, having a high kurtosis value, were considered as the extracted EOG sources. Figure 10C shows the T-F plots of the imaginary components of the first extracted signal y 1 (k) where the presence of the power line artifact is seen, while in Figure 10D the T-F plot of the real and imaginary components of y 2 (k) is shown where the frequency components of the EOG artifacts are seen. signals. In the first study we consider the removal of EOG artifacts ("Eyeblink" set), the second study focused on eye muscle artifacts from rolling the eyes ("Eyeroll" set), whereas the third study addressed the removal of muscle activity from raising the eyebrow ("Eyebrow" set).
In all the studies, the temporal signals from each channel pair were combined to form three complex EEG channels, given by This construction of the complex EEG signals allows for the simultaneous processing of the amplitude and phase information using the K-cBSE algorithm (22). Note that the EOG channels were not part of the mixtures considered. They are only used to assess the performance of the proposed BSE algorithm in the extraction of the EOG artifacts.

Performance measures
As we have no knowledge of the mixing process, the PI (31) is not applicable for this case and thus several alternative quantitative and qualitative measures were used for the evaluation of the algorithm performance. These are briefly discussed below.

Quantitative metrics (a) Kurtosis:
The kurtosis values K c of the complex extracted signals indicate the success of the algorithm in extracting super-Gaussian or sub-Gaussian artifact in a specified order. In addition, the magnitude of the kurtosis K R of the real and imaginary components of the extracted sources are used to automatically select desired components. (b) Power spectra Correlation: In a similar manner to (Barbati et al., 2004), the correlation coefficient between the power spectra of the complex-valued recorded artifact (e.g., EOG) and extracted sources, and likewise, the correlation coefficient between the pseudo-power spectra of the complex-valued recorded artifact and the extracted sources is calculated. This measure indicates the degree of similarity between the extracted and originally recorded artifact, and can be used to automatically select the extracted source pertaining to the biological artifact, while also quantifying the degree of performance of the extraction algorithm. 2. Qualitative metrics (a) Hilbert-Huang Time-Frequency Analysis: By employing timefrequency (T-F) analysis using the Hilbert-Huang (H-H) transform (Huang et al., 1998;Huang and Shen, 2005), we can qualitatively assess the extraction performance through comparison of the frequency components of the mixture and extracted source during the recording session. Also, the T-F analysis of the extracted artifacts will demonstrate the corresponding frequency components and their changes over time, making it possible to assess the quality of the extraction procedure over the recording time. In comparison to Fourier transform based T-F analysis, such as the Short-Time Fourier between the power and pseudo-power spectra demonstrates the effectiveness of the methodology in extracting artifacts in the complex domain.

Case Study 2 -Eye muscle artifact extraction
The "Eyeroll" dataset had contained artifacts from round movement of the eye during the recording session with EOG activity from eye blinks, shown in Figure 11A and kurtosis values given in Table 3.
The resultant electrical activity from the artifacts were recorded using the vEOG and hEOG channels, with EOG activity seen on the vEOG channel at time instants 5, 13, 17, 23, 25, and 29 s, and eye muscle activity present more clearly on the hEOG channel with a duration of around 2 s. The eye muscle artifact was present on all six EEG channels, while the EOG artifact is strong on the Frontal lobe electrodes and the effect of the power line noise is seen more strongly on the central and occipital lobe electrodes. The H-H T-F analysis of Figure 11B illustrates the presence of frequency components up to 10 Hz, as well as scattered frequencies belonging to the 50-Hz power line noise.
In the extraction procedure, the step-size of the K-cBSE algorithm was m = {5,0.2} and b = {−1,1}, while a = 0.975. The T-F analysis of the extraction are illustrated in Figures 11C,D, and the kurtosis values of the complex-valued extracted signals and their real and imaginary components given in Table 4.
The real component of the first extracted source, R{y 1 (k)}, having the smallest kurtosis of K c (R{y 1 }) = −1.1958 contained the power line noise artifact. The eye muscle activity and EOG artifacts were collectively extracted using the real and imaginary components of the second extracted source y 2 (k). The five instances of the eye muscle activity and the EOG can be detected in Figure 11D, while the lack of power line noise frequency components in the 50-Hz range is visible.
These results were also confirmed based on the power spectra of the recorded artifacts and the extracted sources, given in Figure 11E. While the PSD and pPSD of the complex-valued y 1 (k) contained the 50-Hz components, these were suppressed to −40 dB in the spectra of y 2 (k). The frequency components of the mixture channels and extracted artifacts in the 50-Hz range also showed that the line noise artifact was successfully removed (see Figure 11F). Conversely, the spectral components pertaining to the eye muscle and EOG artifacts are present in the PSD and pPSD of y 2 (k) corresponding to the (0-10) Hz range of the PSD and pPSD of the complex-valued EOG. The correlation coefficient between the PSD spectra of the complex-valued recorded EOG channel and extracted source y 2 (k) is 0.8244, while the correlation between the pPSD spectra was 0.8222; these values were respectively 0.0792 and 0.1844 for y 1 (k).

Case Study 3 -EMG extraction
In the "Eyebrow" set, the EEG mixture was heavily contaminated with EMG artifacts from raising the eyebrows, and are shown in Figure 12A with kurtosis values given in Table 3. The EMG signals were recorded using the vEOG and hEOG electrodes, with the effect more prominent on the vEOG recording. All EEG channels were affected by the artifact, though this is not clearly visible in the We next concentrate on the power spectrum and pseudo-power spectrum of the complex EOG signal, constructed in a similar manner to that in (32); the extracted sources y 1 (k) and y 2 (k) are depicted in Figure 10E. Notice that the distribution of power S EOG and pseudo-power  S EOG is concentrated respectively in the frequency range (0-5) Hz and 50 Hz. The spectrum S y 1 and pseudospectrum S y 2 of the first extracted source can be seen to contain around 0 dB of power for a frequency of 50 Hz, while having an average power of −40 dB in the (0-5) Hz frequency range. These results can also be seen by comparing the frequency components of the recorded EEG mixture and extracted artifactual sources around the 50-Hz range, shown in Figure 10G. While the presence of the power line artifact is evident in all recorded channels, after the extraction procedure the 50-Hz frequency component is only present in I{y 1 (k)}. Likewise, the spectra of y 2 (k) illustrate the diminished effect of the line noise source with a power of −20 dB, while retaining the frequency components of the EOG in the low frequency range. To quantify the observed results, the correlation coefficient between the recorded EOG's PSD and pPSD and those of the extracted sources were calculated (Barbati et al., 2004) and presented in Table 4. For the extracted source y 1 (k) these values were respectively 0.2313 and 0.2847, whereas for the source y 2 (k) they were 0.9698 and 0.9822. The correspondence of the results

(E)
The power spectra (S) and pseudo-spectra (pS) of the recorded EMG, and the extracted signals y 1 (k) and y 2 (k). (F) Frequency components of the recorded EEG signals and the extracted artifacts around the 50-Hz frequency range. After extraction, the power line noise is contained in I{y 1 }.

Frontiers in Neuroscience | Neuroprosthetics
October 2011 | Volume 5 | Article 105 | 14 occipital lobe channels due to the strong presence of power line noise. In the T-F domain ( Figure 12B) the EMG frequency range had a large span containing both low and high frequency components, present in the duration of the raising of the eyebrows and lasting for around 2 s. In addition, the 50-Hz frequency component cloud reflecting the power line noise can also be seen. The extraction of the artifacts was performed using the K-cBSE algorithm (22) with step-size m = {2,0.2}, b = {−1,1} and a = 0.975.
As shown in Figures 12C,D, the algorithm successfully extracted the power line noise as the imaginary component of the first extracted signal y 1 (k) and the EMG signal as the real and imaginary components of the second extracted signal y 2 (k). From the T-F plot of y 2 (k) in Figure 12D, the complete EMG frequency component range was successfully extracted, with power line noise frequency components not present.
Considering the power spectra S EMG and pseudo-power spectra  S EMG in Figure 12E, the spectral distribution of the power and pseudo-power spectral density were strong in the (0-10) Hz range with an amplitude of around −10 dB and in the (20-40) Hz range, though having a much lower value. In addition, a single spike at 50 Hz of amplitude −10 dB indicates the presence of power line noise. After the extraction, the power line noise was contained in the spectra of the y 1 (k) while the (0-10) Hz and (20-40) Hz frequency components were present in the PSD and pPSD of y 2 (k). For the "Eyebrow" set, the spectra correlation coefficients between S EMG and  S EMG and those of y 1 (k) and y 2 (k) were respectively {0.1287,0.1078} and {0.7593,0.7906}. Also, the 50-Hz frequency range for the contaminated mixture and the extracted artifacts are shown in Figure 12F. It can be seen that after the extraction procedure, the 50-Hz component is contained in I{y 1 (k)}, while in comparison to the EOG and eye muscle extracted components from the "Eyeblink" and "Eyeroll" studies (see Figures 10G and 11F), components R{y 2 (k)} and I{y 2 (k)} had a higher power level in this range, reflecting the wider frequency range of the EMG artifact. Figure 13 shows the EEG waveforms after the extraction of "Eyeblink," "EyeRoll," "EyeBrow," and 50 Hz noise artifacts. The top two plots compare the proposed method with the widely linear prediction based one in (Javidi et al., 2010). Notice for the first two EEG electrodes Fp1 and Fp2, the predictor-based technique in (Javidi et al., 2010) performed well, with the successful removal of the Eyeblink artifact. However, it performed poorly in terms of the 50-Hz noise removal, which caused the "Eyeblink" artifact to be present (but attenuated) in the remaining EEG electrodes. Comparing Figure 13C with Figure 11A, it is clear that the "EyeRoll" artifact was either heavily attenuated or removed; whereas comparing Figure 13D with Figure 12A demonstrates that it is quite challenging to remove completely the "EyeBrow" artifact; however, the 50-Hz noise has been removed almost perfectly, as illustrated by comparing the bottom plots of Figures 12A and 13D.

dIscussIon
Both qualitative and quantitative metrics have showed that the kurtosis-based extraction method yields enhanced results for realtime extraction of artifacts. Excellent results were obtained for the removal of eye blink, eye roll, and power line artifacts. Although artifacts arising from eye rolling and raising the eye brow might seem similar to that of an eye blink, it is much more challenging to perform their complete removal in the context of real-time EEG processing, as they involve longer firing of larger groups of muscles. These are critical cases, as the EMG source goes into saturation; and to our knowledge, these artifacts have not been considered before in the literature. These results are promising, as our technique operates real-time, in contrast to methods such as in (Vigário, 1997;Jung et al., 2000;Delorme et al., 2001Delorme et al., , 2007Barbati et al., 2004;Greco et al., 2005;Kumar et al., 2009). The advantage of the proposed kurtosis-based method as compared to our previous method (Javidi et al., 2010) is also in that the proposed method allows us to select a particular artifact to be extracted. For instance, if we wish only the EOG artifact such as eye blink to be removed, the parameter b in (22) can be set to unity; whereas in (Javidi et al., 2010), we do not have full control over which artifact is going to be extracted.

conclusIon
Blind source extraction of the generality of complex-valued signals based on the degree of non-Gaussianity and from noisy mixtures has been addressed. A cost function based on the normalized kurtosis has been utilized to perform blind extraction, and the corresponding online algorithm 2 (K-cBSE) has been derived. The existence and uniqueness of the solutions have been discussed and VSS variants of to the algorithm have been addressed. It has been shown that the algorithm is robust to the degree of non-circularity of the additive noise and the success of the algorithm over increasing noise levels has been demonstrated. Simulations in noise-free and noisy environments illustrate the successful performance of the algorithm in the extraction of both circular and non-circular signals, while the extraction of EOG and EMG artifacts from recorded EEG signals in real-time demonstrate a practical application for the proposed methodology. 2 This work was supported by the EPSRC grant EP/H026266.