Spatio Temporal EEG Source Imaging with the Hierarchical Bayesian Elastic Net and Elitist Lasso Models

The estimation of EEG generating sources constitutes an Inverse Problem (IP) in Neuroscience. This is an ill-posed problem due to the non-uniqueness of the solution and regularization or prior information is needed to undertake Electrophysiology Source Imaging. Structured Sparsity priors can be attained through combinations of (L1 norm-based) and (L2 norm-based) constraints such as the Elastic Net (ENET) and Elitist Lasso (ELASSO) models. The former model is used to find solutions with a small number of smooth nonzero patches, while the latter imposes different degrees of sparsity simultaneously along different dimensions of the spatio-temporal matrix solutions. Both models have been addressed within the penalized regression approach, where the regularization parameters are selected heuristically, leading usually to non-optimal and computationally expensive solutions. The existing Bayesian formulation of ENET allows hyperparameter learning, but using the computationally intensive Monte Carlo/Expectation Maximization methods, which makes impractical its application to the EEG IP. While the ELASSO have not been considered before into the Bayesian context. In this work, we attempt to solve the EEG IP using a Bayesian framework for ENET and ELASSO models. We propose a Structured Sparse Bayesian Learning algorithm based on combining the Empirical Bayes and the iterative coordinate descent procedures to estimate both the parameters and hyperparameters. Using realistic simulations and avoiding the inverse crime we illustrate that our methods are able to recover complicated source setups more accurately and with a more robust estimation of the hyperparameters and behavior under different sparsity scenarios than classical LORETA, ENET and LASSO Fusion solutions. We also solve the EEG IP using data from a visual attention experiment, finding more interpretable neurophysiological patterns with our methods. The Matlab codes used in this work, including Simulations, Methods, Quality Measures and Visualization Routines are freely available in a public website.

We can use this and [2-13] to obtain: Taking into consideration the symmetry of the argument with respect to the origin, and rearranging the Dirac delta function we obtain: where = | | 2 ⁄ and | | is the determinant of . ■ c) The joint pdf of parameters and new hyperparameters is: ( •, , •, | ) = ∏ ( , | , , ) ( •, | ) Using final results from previous items a) and b), it becomes: Marginalizing over , : And marginalizing over •, using the change of variable Where we recall that = | | 2 ⁄ . Inserting now the decomposition: And rearranging terms, we arrive at:

D. Update equations for ENET and ELASSO models
Derivation of each update equation a) Parameters. In both ENET and ELASSO we obtain: and equating to zero we obtain:

E. Inference strategy and implementation details
The high computational cost for obtaining ̅ by means of a matrix inversion operation can be avoided by using the economical singular value decomposition (SVD) of the lead field = , and the Woodbury identity (Magnus and Neudecker, 2007), leading to: [2-26] The update formulas in Proposition 2.3.2 a), b) are consistent with the sparsity constraint in both the ENET and ELASSO models, since the elements of the effective prior variance matrix Λ (or equivalently Λ ̅ ) select which elements of become zero. When Λ , → 0, the -th row and -th -column of the matrix ̅ in  tend to zero vectors, from where ̂, → 0.
In the same way, if some parameters are very small in a previous iteration (̂, ≈ 0, i-th diagonal element ̅ , ≈ 0), they will lead to Λ , → 0 in the next iteration (equations [D2] and [D8]). In some algorithms, this property usually means that if one activation is set to zero (e.g. removed from the active set) in an iteration, it will not appear as part of the solution. In our case, however, we do not prune to zero the small coefficients. Therefore, although unlikely, a "zeroed" activation might be reestimated in a future iteration and contribute to the solution. These terms decrease strictly with respect to their arguments leading to smaller values of F for higher values of and , which is equivalent in both cases to have more zero elements in Λ ̅ . The measurements variance in [D6] and [D12] is generally considered superfluous in the learning process, because it only acts as a scale factor for the parameters and usually decelerates the algorithm convergence (Babacan et al., 2010). In our case, we fix it to =1, for all time points. We also use fixed values for the parameters of the Gamma distribution [2][3][4][5][6][7][8][9][10][11] in the ENET model. In particular we chose = , which preserves the monotony of [D5] (in the sense that only one zero of exists), and = , where is such that has a flexible prior, with mean( )≈ 1⁄ and variance( )≈ 1 ( 2 ) ⁄ . Obviously, the value of that optimizes ℒ will depend on ( , ), since these have some influence in the intercept of [D5]. In order to keep an adequate balance we impose identical prior to 1 , which allows the flexibility in our learning of different degrees of sparsity. Although optimal values for ( , ) might also be estimated within the Empirical Bayes (setting their corresponding priors), we only consider here an exploratory study where they are fixed in the ENET model.

G. Mathematical notation
Symbol Description (•) Continuous function that represents the scalp voltage, dependent on scalp coordinates ( ) and time ( ). Scalp coordinates.
x spatio-temporal matrix that represents the scalp voltage (data), rows represent sensors and columns represent time points. Number of scalp sensors.
x spatio-temporal matrix that represents sensors' noise.

(•)
Continuous function that represents the PCD, dependent on the source's space coordinates ( ) and time ( ). Sources' space coordinates. Volumetric differential element in the sources' space.
x spatio-temporal matrix that represents the PCD (parameters), rows represent points within the discretized sources' space and columns represent time points. Number of points in the discrete sources' space , Indexes used to represent points within the discretized sources' space.
•, 'th column vector of the spatio-temporal parameters matrix (PCD). , − 1 dimensional column vector obtained from •, by subtracting the 'th element , − 1 dimensional volumetric differential element of the , column vector.

(•)
Continuous function that represents the Lead Field, dependent on scalp coordinates ( ) and source space coordinates ( ).
x Lead Field matrix.
x matrix that represents the Laplacian operator. Continuous/discrete time index. Number of time points.

(•)
Function that represents the constraints or penalties. Regularization parameter.
x matrix of the new hyperparameters derived from scaled Gaussian mixtures procedures.
x matrix of the new hyperparameters derived from the Elitist Lasso hierarchization. Normalization constant of the hyperparameter probability density function.
x matrix of parameter's variances in the hierarchical ENet and ELasso models.
x matrix proportional to parameter's variances in the hierarchical Elastic Net and Elitist Lasso models. , Scale and shape of the hyperparameter's Gamma prior. General variable used to represent Markov Random Fields and some integrals. ℋ , ℋ General variable that furnishes a subset of elements indexed by ℋ within the vector and its complement correspondingly.
Potentials of the Elitist Lasso spatial Markov Random Field.

ℐ
General variable that represent a set of indexes.
1 × x matrix of ones. × x identity matrix ℛ + Indicator function of the set of non-negative coordinates points in the dimensional real space.

Θ, Θ
Variables that correspondingly embrace all hyperparameters of the model and all hyperparameters of the model for a single time point.
Variable that represents the mean of the parameters Gaussian posterior distribution.