Minimizing the distortions in electrophysiological source imaging of cortical oscillatory activity via Spectral Structured Sparse Bayesian Learning

Oscillatory processes at all spatial scales and on all frequencies underpin brain function. Electrophysiological Source Imaging (ESI) is the data-driven brain imaging modality that provides the inverse solutions to the source processes of the EEG, MEG, or ECoG data. This study aimed to carry out an ESI of the source cross-spectrum while controlling common distortions of the estimates. As with all ESI-related problems under realistic settings, the main obstacle we faced is a severely ill-conditioned and high-dimensional inverse problem. Therefore, we opted for Bayesian inverse solutions that posited a priori probabilities on the source process. Indeed, rigorously specifying both the likelihoods and a priori probabilities of the problem leads to the proper Bayesian inverse problem of cross-spectral matrices. These inverse solutions are our formal definition for cross-spectral ESI (cESI), which requires a priori of the source cross-spectrum to counter the severe ill-condition and high-dimensionality of matrices. However, inverse solutions for this problem were NP-hard to tackle or approximated within iterations with bad-conditioned matrices in the standard ESI setup. We introduce cESI with a joint a priori probability upon the source cross-spectrum to avoid these problems. cESI inverse solutions are low-dimensional ones for the set of random vector instances and not random matrices. We achieved cESI inverse solutions through the variational approximations via our Spectral Structured Sparse Bayesian Learning (ssSBL) algorithm https://github.com/CCC-members/Spectral-Structured-Sparse-Bayesian-Learning. We compared low-density EEG (10–20 system) ssSBL inverse solutions with reference cESIs for two experiments: (a) high-density MEG that were used to simulate EEG and (b) high-density macaque ECoG that were recorded simultaneously with EEG. The ssSBL resulted in two orders of magnitude with less distortion than the state-of-the-art ESI methods. Our cESI toolbox, including the ssSBL method, is available at https://github.com/CCC-members/BC-VARETA_Toolbox.

Oscillatory processes at all spatial scales and on all frequencies underpin brain function. Electrophysiological Source Imaging (ESI) is the data-driven brain imaging modality that provides the inverse solutions to the source processes of the EEG, MEG, or ECoG data. This study aimed to carry out an ESI of the source cross-spectrum while controlling common distortions of the estimates. As with all ESI-related problems under realistic settings, the main obstacle we faced is a severely ill-conditioned and high-dimensional inverse problem. Therefore, we opted for Bayesian inverse solutions that posited a priori probabilities on the source process. Indeed, rigorously specifying both the likelihoods and a priori probabilities of the problem leads to the proper Bayesian inverse problem of cross-spectral matrices. These inverse solutions are our formal definition for cross-spectral ESI (cESI), which requires a priori of the source cross-spectrum to counter the severe ill-condition and high-dimensionality of matrices. However, inverse solutions for this problem were NP-hard to tackle or approximated within iterations with bad-conditioned matrices in the standard ESI setup. We introduce cESI with a joint a priori probability upon the source cross-spectrum to avoid these problems. cESI inverse solutions are low-dimensional ones for the set of random vector instances and not random matrices. We achieved cESI inverse solutions through the variational approximations via our Spectral Structured Sparse Bayesian Learning (ssSBL) algorithm https://github.com/CCC-members/Spectral-Structured-Sparse-Bayesian-Learning. We compared low-density EEG (system) ssSBL inverse solutions with reference cESIs for two experiments: (a) high-density MEG that were used to simulate EEG and (b) high-density macaque ECoG that were recorded simultaneously with EEG. The ssSBL resulted in two orders of magnitude with less distortion than the state-of-the-art ESI methods.

Introduction
Brain function is embodied in large-scale dynamic networks underlying all behavior and cognition. The natural modes of these networks are oscillations at functionally specific frequencies (Engel et al., 2001;Varela et al., 2001). Direct (invasive) observations of brain oscillations are increasingly more fine-grained and informative (Buzsáki et al., 2013;Frauscher et al., 2018). However, they cannot be feasibly examined in the whole brain in vivo human studies. Hence, considerable effort is dedicated to developing indirect (non-invasive) imaging methods to reveal the brain's oscillatory architecture with fine-grained time resolution. Notably, such an imaging modality should reveal the two different but interrelated aspects of networks: (a) the activity of each dynamical unit (node) and (b) their connectivity.
In addition, ESI has been used in attempts to estimate these currents, also known as source activity, from their electrophysiological observables. For a neural mass, the current is directly proportional to the mean field activity (Valdes-Sosa et al., 2009;Rosa et al., 2010). In turn, each neural mass is the collection of neurons to the extent of a few millimeters such that a mean-field observable (Freeman, 1975;Vinck and Perrenoud, 2019) is the "source" of the observations. In this study, we attempted the most general formulation of the problem as a reference for future work while providing concrete examples.
In principle, source activity in a network is modeled as a strictly dynamic random field ι (̺ , ς) over a continuous spatiotemporal manifold with ̺ ∈ R 3 (the parts of the brain that generate observations) and continuous time ς ∈ R. In practice, the random field ι (̺ , ς ) must be discretized at spatial points ̺ g ; g = 1, · · · , G, and at discrete time points ς t = t △ ς ; t = −T, · · · , T. In this scenario △ς is the sampling period. Thus, we focused on the vector time series ι (t), defined as the vector function with entries ι g , t = ι ̺ g , ς t . The (multi-channel) electro-physiological data is the vector time series v (t) with entries v (e , t) for each sensor, e = 1 · · · E that arises from the discretization of v (̺ , ς ), where the electromagnetic field was produced by ι (̺ , ς ).
The data v (t) from its latent source activity ι (t) is presented in the following forward model (Eq. 1).
Where L vι is the real-valued lead field matrix or forward operator that projects sources ι (t) to forward the data v (t), and ξ (t) is the time series of instrumental noise assumed to be independent of the source activity ι (t). The forward operator (lead field) L vι is linear and stationary by definition, derived from the discretization of a quasi-static electromagnetic forward model (Hämäläinen et al., 1993;Riera and Fuentes, 1998;Hallez et al., 2007;Lei et al., 2011;Piastra et al., 2020). For operators, we used suffixes that indicate the operator's codomain and domain.
Optimizing W ιv from the data solves an inverse problem, Eq. (1), that is not only ill-posed in the sense of Hadamard (Hadamard and Morse, 1953) with degeneracy in a (G − S)-dimensional space but also severely ill-conditioned and high dimensional (with G ≫ E).
Overcoming these challenges to obtain acceptable inverse solutions has been the subject of much research in specific optimization methods for W ιv which was well-summarized in the study of Knösche and Haueisen (2022).
Although we have defined ESI in terms of time domain signals, it is well-known that brain activity is oscillatory at all scales, from the local field potentials at the neuronal level (Freeman, 1975;Vinck and Perrenoud, 2019) to the EEG, MEG, or ECoG (Niedermeyer and da Silva, 2005;Le Van Quyen and Bragin, 2007;Frauscher et al., 2018). In tailoring ESI for oscillatory activity, a natural framework is that of the frequency domain ι f , a random vector representing the (discrete) Fourier transform of vector time series ι (t), and comprising complex-valued entries ι g , f for each source g = 1, · · · , G and frequency f = −T, · · · , T. Considering that we may compute the physical frequency as ν f = f △ ν for a spectral period △ν = 1 ((2 T + 1) △ ς) where (2 T + 1)△ς is the Nyquist frequency, the corresponding frequency domain data term is v f . Then, the equivalent expressions to the previous Eqs.
Our concern, then, is with frequency domain ESI in Eq. (3), our primary target being the source cross-spectral density matrix or source cross-spectrum ιι f = ι f ι † f , with the expected value over the sample space of the discrete Fourier transform ι f (Valdés et al., 1992;Engel et al., 2001;Varela et al., 2001;Nunez and Srinivasan, 2006;Hipp et al., 2012;Vidaurre et al., 2018b). The corresponding data cross-spectrum is vv f = v f v † f . In practice, this later quantity must be substituted by its estimator vv f is denoted with a different hat type since the expectation is for a finite number of instances M with an index m = 1 · · · M. Toward this estimation, we followed the standard practice of segmenting the data time series v (t) into segments (v m (t) ; ∀ m). Thus, we worked with instances v m f ; ∀ m of the discrete Fourier transform applied to realizations or observations for these segments.
The frequency domain inverse problem and the inverse solution for the source cross-spectrum ιι f may be stated as Eq. (5) (He et al., 2019), valid under the condition of independence between the source process ι (t) and the noise process ξ (t) in the previous forward model (Eq. 1).
The pursuit of the cross-spectrum ιι f is rewarding since it completely specifies the multivariate linear properties of any stochastically driven system, be it linear or non-linear (Brillinger, 2001(Brillinger, , 2012, though the latter requires additional higher-order kernels for a complete description (Brillinger, 1965;Brillinger and Rosenblatt, 1967). We used the asymptotic stochastic properties of the discrete Fourier transform to introduce these developments (Section Asymptotic probability theory of the Fourier transform and cross-spectrum).
Cross-spectral diagonal elements ιι g , g , f have intuitive interpretations: the variances σ 2 ιι g , f (σ 2 ιι f = diag ιι f ) are the spectra of source activity, the Cortical Spectral Topography (CST). Off-diagonal elements ιι g , g ′ , f are the cross-spectra reflecting functional connectivity. Optimal inverse operatorsŴ ιv for the cross-spectrum ιι f Eq. (5) is a novel form of electrophysiological source imaging: cross-spectral ESI (cESI).
From expressions (Eqs. 1-5), the time and frequency domain variants of the inverse problem, it is evident that cESI, the inverse solutionˆ ιι f of the source cross-spectrum ιι f , may be obtained from three different types of primary information: v (t), or v f , or vv f . Initially, it might seem that an inverse solution ιι f from any of these data types would be equivalent. However, as shown further into the study, each data type requires optimizing its specific inverse operatorŴ ιv , thus defining different estimation "routes" to cross-spectrum ιι f .
We describe in detail the theory, benefits, and problems of each route, in terms of the forward operator L vι (Section Forward projection by the lead field linear and stationary operator), and the inverse operator W ιv (Section Bayesian (MAP) inverse operators. Quasilinear (F-invariant) approximation).
In order to explore these routes in the following sections (Sections Bayesian MAP1 inverse operators. MNE, eLORETA, and LCMV as particular cases and Data used for the validation), we need to select an inverse solution framework, of which there are many potentially valuable approaches (Knösche and Haueisen, 2022). We adopted the Bayesian maximum a posteriori (MAP) probability approach (MacKay, 2003). The MAPs (for each route) are derived from finding the latent variables X that maximize the posterior probability q (X |Y)(Eq. 6).
Where p (Y|X ) is the likelihood of the data Y conditional on the latent variable X . In our context, Y can corresponding to be v (t), or v f , or vv f , and X can be corresponding to ι (t), or ι f , or ιι f (Grave de Peralta Menendez et al., 2004;Friston et al., 2006;Mattout et al., 2006;Wipf and Nagarajan, 2009). The term p (X ) is the respective a priori probability upon the latent variable ι (t), or ι f , or ιι f .
As can be observed in the following section (Section Data used for the validation), the third route inverse solution (Eq. 5) has desirable properties for cESI. However, if not impossible in a realistic cESI setup, such inverse solutions are N-P hard and therefore require Approximated Bayesian Computation (ABC) (Csilléry et al., 2010). This issue has been discussed in detail in the Bayesian literature (Dempster et al., 1977;Liu and Rubin, 1994;Daunizeau and Friston, 2007;Friston et al., 2007;Nummenmaa et al., 2007;Paz-Linares et al., 2018).
In this study, we took a more practical path. Rather than attempting to use complicated ABCs to obtain more general solutions, the cESI rationale we focused on is to restrict Bayesian inverse operators W ιv in Eqs. (1-5) within the "quasilinear" class W ιv ∼ = T ιv . Quasilinear inverse operators are also known in a more general context as the linear proximal operators or as linear back projectors that solve non-linear optimization or inverse problems (Kaplan and Tichatschke, 1998;Piotrowski and Yamada, 2008;Gramfort et al., 2012;Tirer and Giryes, 2020).
Here, quasilinear inverse operator T vι , which holds the same linearity attribute as the forward operator L vι , produced cESI that preserved the amplitude and phase information in the frequency domain (Mantini et al., 2007;Marzetti et al., 2008;Brookes et al., 2011b;Hipp et al., 2012;Lopes da Silva, 2013;He et al., 2019;Nolte et al., 2020). That is, we ensured that T vι possesses the attribute of what we call "F-invariance" for essential properties (Brillinger, 2001(Brillinger, , 2012). An immediate consequence of taking W ιv ∼ = T ιv , a quasilinear approximation of the inverse operator in the third cESI route (Eq. 5), is the following representation of source cross-spectrum ιι f ∼ = ιι f (Eq. 7). Here, we considered ιι f to be calculated from the set of random instances ι m f ; ∀ m of the latent vector process ι f .
Once ιι f ∼ = ιι f was assumed, the ABCs were simplified dramatically since a MAP for the matrix ιι f (Eq. 7) turns into a joint-MAP (Section Validation rationale) for the random vectors ι m f ; ∀ m that are implicit in ιι f (Hsiao et al., 1998(Hsiao et al., , 2002Yeredor, 2000;Davis et al., 2001;Auranen et al., 2005;Wipf and Nagarajan, 2009;Chen et al., 2011). We implemenedt the joint-MAP via Spectral Structured Sparse Bayesian Learning (ssSBL), the type of ABC developed in Section Measures of distortion. As shown under realistic inverse problem settings (Section Results), the third cESI route (Eq. 5) implemented via the ssSBL approach leads to less distorted estimates than the traditional methods for the first and second cESI routes (Eqs. 1-3). To judge distortions, we employed the well-known ESI methods as a baseline: Exact Low-Resolution Electromagnetic Tomographic Analysis (ELORETA) (Pascual-Marqui et al., 2006) and Linearly Constrained Minimum Variance (LCMV) (Van Veen et al., 1997).
As we will show in Section Discussion, the topographic distortions of the inverse solutions for a random vector ι (t) or ι f can reach unacceptable levels for second-order statistics, such as the sample estimator for the cross-spectrum ιι f calculated from an inverse solutionι (t) orι f . Minimizing CST distortions (for the estimator of the spectrum σ 2 ιι f ) benefits the overall cross-spectral estimation (for ιι f ). Our results suggest that that ssSBL significantly reduces these distortions.

Standard cESI theory
Asymptotic probability theory of the Fourier transform and cross-spectrum The material in this section (with minor differences in notation) is described in greater detail in Brillinger (2012). For an exhaustive definition of terms, refer to Supplementary material (SD, Section Introduction). Our primary interest will be in frequency domain quantities. Let x (t) be an R-dimensional vector time series. We worked with the following definitions: • The vector stochastic process x f (Eq. 8) is defined in the discrete frequency domain ν f = f △ ν with f = −T, · · · , T, as the discrete Fourier transform of the vector time series x (t ).
• The cross-spectral density matrix or cross-spectrum xx (ν) (Eq. 9) was defined in the frequency domain ν as the Fourier transform of the auto-covariance matrix xx (τ ).
Here, the auto-covariance xx (τ ) = x (t) x † (t + τ ) depends on the time-lag τ and does not vary with time t, thus being second-order stationary. We will furthermore assume (for technical reasons) that x (t) is a strictly stationary vector time series where all moments are also translation invariants. We also assumed that the strong mixing condition holds. This condition was due to the rapid decrease in the magnitude of the autocovariance xx (τ ), and all higher-order moments as the time-lag τ increases.
A fundamental result on which we based our work is Theorem 4.1.1 of Brillinger (2001), which can be understood as the equivalent for Fourier coefficients of the central limit theorem under stationarity and strong mixing conditions (Rosenblatt, 1956). In our notation, this theorem states: "Assume that the number of time points in the discrete Fourier transform (Eq. 8), goes to the infinity (T → +∞), and let the sampling period go to zero (△ς → 0+) so that the spectral resolution △ν = 1 ((2 T + 1) △ ς ) = constant holds constant. Then, it holds that x f is asymptotically independent for all f and converges in probability to the circularly symmetric multivariate complex-valued Gaussian probability density (Eq. 10). Where the Hermitian covariance matrix xx f is the cross-spectrum (Eq. 9) at the frequencies ν f = f △ ν with f = − T, · · · , T." We emphasize that this Gaussian distribution asymptotic distribution not only holds for x f but also for time-varying . /fnins. . estimators such as the time-windowed discrete Fourier transform or the Hilbert transform x t , f (Bruns, 2004). The latter has been widely used in the literature (Faes and Nollo, 2011;Friston et al., 2012;Nolte et al., 2020). It is important to note that Gaussianity might not be valid for the original time series x (t) or even one of its band-filtered versions, as many assume in the literature (Grave de Peralta Menendez et al., 2004;Friston et al., 2006;Mattout et al., 2006;Wipf and Nagarajan, 2009;Paz-Linares et al., 2017). In fact, even in the case of non-Gaussian, non-linear time series Brillinger's theorem is valid as long as stationarity and strong mixing hold. This validity does not imply that the spectral density matrix xx f completely characterizes the non-linear or non-Gaussian system. Cumulant information of orders higher than two may be necessary for a complete system description (Brillinger, 1965;Brillinger and Rosenblatt, 1967).
Brillinger's theorem leads to the probability density of any sampled estimator of the cross-spectrum. In particular, it applies to xx f = x m f x † m f ; M the sampled estimator for Fourier transform instances x m f ; ∀ m with sample size M. Here, these instances x m f ; ∀ m represent the discrete Fourier transform applied to sample realizations (x m (t) ; ∀ m) obtained from time segments of the observations. Then, it follows that a Hermitian Wishart W C (Eq. 11), with R-dimensional scale matrix xx f (cross-spectrum), and degree of freedom M (with M R), is the probability density of the estimator xx f .
Since we based our further developments on the assumption of Gaussianity (Eqs. 10, 11), we carried out the statistical test for the distribution of the discrete Fourier transform x f (Eq. 10) with two examples of resting state sensor data: MEG data from the Human Connectome Project (HCP) (Van Essen et al., 2013) and Macaque ECoG data from the Neurotycho project (Nagasaka et al., 2011). The outcome of this test for data did not allow us to reject the hypothesis of Gaussianity, which then was also plausible for the sources.

Forward projection by the lead field linear and stationary operator
We emphasize that linearity and stationarity are essential "conservative" attributes of the operators for our target (cESI). We note, this is the reason why one can state all the forward "routes" in terms of the same operator L vι (Eq. 12). Preserving the Gaussianity of the Fourier transform ι f is only possible under the linear forward operator L vι and subsequent linear or quasilinear inverse operator T ιv (Marzetti et al., 2008;He et al., 2019;Nolte et al., 2020).
Furthermore, both attributes are crucial to avoid non-linear warping and delays of the amplitude and phase information in the frequency domain for cESI (Reid et al., 2019). Therefore, we define operators with these properties for the frequency domain as "Finvariant." Though we restricted our attention later to F-invariant operators T ιv , non-linear operators W ιv have been useful in other contexts Lopes da Silva et al., 1991;Clark et al., 1994;Makeig et al., 1999Makeig et al., , 2004Makeig, 2002;Eichele et al., 2005;Harrison et al., 2008;Vega-Hernández et al., 2008;Maurer and Dierks, 2012).
Noteworthily, distinguishing forward "routes" leads to variants of the inverse problems or inverse operators W ιv for estimating the specific latent variable ι (t), or ι f , or ιι f , as discussed in the next section (Section Bayesian (MAP) inverse operators. Quasilinear (F-invariant) approximation). Estimation of the crossspectrum ιι f , involves the challenging inverse problem of the matrix equation (route 3) detailed in section (Section Bayesian MAP1 inverse operators. MNE, eLORETA, and LCMV as particular cases).
We currently illustrate the effects of the forward operator with the topographic maps: Cortical Spectral Topography (CST), a map of the cortical spectrum σ 2 ιι f = diag ιι f , and Sensor Spectral Topography (SST), a map of the sensor spectrum σ 2 vv f = diag vv f ( Figure 1). On the left is a hypothetical CST, and on the right is the corresponding SST. The inverse problem consists in estimating the latent CST from observed SST.

Bayesian (MAP) inverse operators, quasilinear (F-invariant) approximation
From the theory of inverse problems (Tarantola, 2005), one may seek inverse solutions for each of the latent variables in the routes corresponding to ι (t), or ι f , or ιι f (Eq. 13) and Figure 2. In each route, the theoretical inverse operatorŴ ιv represents the symbolic projection to the source space and these inverse solutions. In turn, inverse operators are determined (analytically or numerically) by data-driven methods that depend on the routes from the data variable: v (t), or v f , or vv f , and the forward operator L vι .
Let any of the data variables be represented generically by Y, and likewise, X represent all types of latent variables in Eq. (13). A maximum a posteriori (MAP) Bayesian inverse operator W X Y (Eq. 14) achieves the maximum of a posteriori probability q (X |Y). We define the a posteriori q (X |Y) as the conditional probability determined from the likelihood p (Y|X ) and a priori p (X ). Depending on q (X |Y) an inverse operatorŴ X Y can be non-linear or linear, computed numerically or analytically, also intractable or tractable.
The essential role of the a priori p (X ) is to overcome the ill condition of the likelihood p (Y|X ), in other words, to provide a . /fnins. .

FIGURE
An illustration of ιι (f) the source cross-spectrum of human cortical alpha activity and its EEG cross-spectrum vv (f). For this illustrations we calculate the sampled estimates ιι (f) and vv (f) of which we represent the topographic projections: (a) Cortical Spectral Topography (CST)

FIGURE
Inverse operators from (a ) the MEG/EEG/ECoG sensor signal v t (t), defined as D tensor with every trial and time-point observation, to (b ) the source cross-spectrum at every frequency ιι (f), via the di erent cross-spectral Electrophysiological Source Imaging (cESI) routes. Route (a → b → b → b ) via an inverse operator W ιv in the time-domain first determines (b ) the source processes ι (t) and then compute (b ) their Fourier transform ι (f). Route (a → a → b → b ) first computes (a ) the data Fourier transform v m (f) and then via an inverse operator W ιv in the frequency domain determines (b ) their source Fourier transform ι m (f). Route (a → a → a → b ) first computes (a ) the data Fourier transform v (f) and then determines (a ) the data cross-spectrum vv (f). We revindicate Route , which is via an inverse operator W ιv in the spectral-domain with a priori probabilities upon the source cross-spectrum.
Frontiers in Neuroscience frontiersin.org . /fnins. . unique solution to the inverse problem for the source variable X .
Here we formulate this a priori p (X ) as a Gibbs probability density (Eq. 15).
Where α is the Gibbs temperature parameter. The Gibbs energy function H (X ) is commonly defined using the norms of vectors or matrices (Petersen and Pedersen, 2008;Golub and Van Loan, 2013), usually used to reflect empirical criteria about the structure and density of the source variable X defined over the spatial, time, or frequency domains. Assimilating these source qualities requires, in addition, the empirically determined (data-driven) scale parameter α, by some method from the data variable Y.
We emphasize that this Bayesian MAP formalism is particularly helpful in developing the cESI routes and our optimal inverse solution in the following sections (Sections Bayesian MAP1 inverse operators. MNE, eLORETA, and LCMV as particular cases and Data used for the validation). In another context, Eqs. (14, 15) may be completely equivalent to the classical Tikhonov regularization (Tikhonov and Arsenin, 1977), taking a posteriori p (X |Y) under the − log transformation. However, the Bayesian formalism is more general than the Tikhonov regularization since the Gibbs energy H (X ) (Eq. 15) describes the statistical properties of any physical system (Landau and Lifshitz, 1980).
As stated before, we constrained all routes (Eq. 13) to the class of quasilinear inverse operators T ιv (Eq. 16) (Baillet et al., 2001;Grech et al., 2008). Moreover, the quasilinear class leads to the type of F-invariant inverse solutions for the properties in Eqs. (10, 11), and particularly the tractable cESI route 3.
The term "quasilinear" T ιv is applies to inverse operators defined as non-linear matrix functions of the vector argument v (t) or v f , or the matrix argument vv f . A matrix function comprises entries T ιv g , s in the G × E cartesian product of sources g = 1 · · · G and sensors e = 1 · · · E. These entries are non-linear functions of the vector arguments T ιv g , s , v (t) , or T ιv g , s , v f , or the matrix argument T ιv g , s , vv f .

Bayesian MAP inverse operators. MNE, eLORETA, and LCMV as particular cases
Inverse solutions for routes 1 and 2 (Eq. 13) are the traditional activation Electrophysiological Source Imaging (aESI). We obtained these inverse solutions as a first-type MAP (MAP1) (Eq. 17), which estimates the source variable ι (t) in the time domain (Hämäläinen and Ilmoniemi, 1994), or the source variable ι f in the frequency domain (Salmelin and Hämäläinen, 1995).
The a posteriori probabilities require definitions of the likelihood, and the a priori is given below in Eq. (18). For the data v (t) a real-valued Gaussian is commonly assumed with mean L vι ι (t) and noise covariance ξξ (t). This assumption might not be valid for all types of time domain data. In contrast, and in virtue of Brillinger's theorem cited above, data v f obtained with the discrete Fourier transform will almost certainly have a complexvalued Gaussian likelihood with mean L vι ι f and noise crossspectrum ξξ f and thus the one we adopted. The a priori probabilities in Eq. (18) are placed upon the activation Gibbs energy in the time domain H (ι (t)) or the frequency domain H ι f (Eq. 18) as expressed by the vector p-norm (frequently p-norm p ) upon real-valued ι (t) or complex-valued ι f (Riesz, 1910;Rudin, 1970;Dunford and Schwartz, 1988;Bourbaki, 2013).
Henceforth, we ruled out route 1 since it is based on the adhoc Gaussian assumption for p (v (t)|ι (t)), which, as mentioned before, is not always tenable. Furthermore, our interest was in the frequency domain, which concentrates our attention on route 2, and which bases the likelihood of the Fourier transform p v f ι f (Eq. 18) on the complex-valued Gaussian probability (Eq. 10).
Toward cESI, selecting a priori p ι f (Eq. 18) follows the standard rationale that oscillatory brain networks and their activity are characterized by a large-scale and dense (or non-sparse) distribution in conditions of resting-state or task in a block design (Mantini et al., 2007;Brookes et al., 2011b;Hipp et al., 2012;Lopes da Silva, 2013). Such activity is the stochastic and stationary process ι (t) that is composited by oscillations in the frequency domain, as described by the Fourier transform ι f (Engel et al., 2001;Varela et al., 2001;Vidaurre et al., 2018a,b;Tewarie et al., 2019).
A general smooth a priori model posits the following Gibbs energy H 2,A(f ) ι f (Eq. 19) which, in addition, specifies A f as some positive definite and symmetric, or Hermitian, matrix.
Specifying the matrix A f may follow some types of goodness criteria of the inverse solution, and data-driven methods, which lead to the most common cases of quasilinear (F-invariant) inverse operators (Baillet et al., 2001;Hauk, 2004;Grech et al., 2008;Marzetti et al., 2008;Henson et al., 2011;Hindriks, 2020).
Representing the most common quasilinear cases the following inverse operatorT ιv A f , B f , in brief notationT ιv f (Eq.

20)
, the constrained generalized inverse of the forward operator L ιv that incorporates the regularization matrices A f and B f .
WhereT ιv f arises from the route 2 MAP1 (Eq. 17) that incorporates the Gibbs energy H 2,A(f ) ι f (Eq. 19), and B f any approximation to the noise cross-spectrum ξξ f in probabilities (Eq. 18); and q ι f v f is the Gaussian a posteriori N C , with meanT ιv f v f and covariance matrixˆ ιι f , that follows from the conjugated relation between the likelihood and a priori probabilities in the route 2 MAP1.
The cESI estimator is then ιι f (Eq. 21) for any set of inverse solution instances ι m f ; ∀ m , or more compactly from an estimate of the data cross-spectrum vv f (Eq. 22).
Important examples of cESI that follow route 1 are MNE, eLORETA and LCMV: • The Minimum Norm Estimate (MNE) (Hämäläinen and Ilmoniemi, 1994). The basic smooth model of the source variables, that could either disregard the weight matrix A f = I or consider it ad-hoc A f = A ac based on anatomical information. • The Linearly Constrained Minimum Variance (LCMV) (Van Veen et al., 1997). The beamformer method that estimates a diagonal weight matrix A f = diag (a). This estimation produces an ideal filter (inverse operator) for each source variable, suppressing the interference from the other source variables and performing ideally under focalized distribution around one or a few sources.

• The Exact Low-Resolution Electromagnetic Tomographic
Analysis (ELORETA) (Pascual-Marqui et al., 2006). A regression method that estimates A f so that the localization of the maximum for the estimated source variables corresponds exactly to the true maximum. This estimation performs ideally under a unimodal and smooth distribution of source variables.
The variants of these techniques are the one optimized for route 2, ESI in the frequency domain: the Spectral eLORETA (seLORETA) (Nolte et al., 2020) and the Spectral LCMV (sLCMV) (Larson-Prior et al., 2013). An additional solution used in this study as a reference is the spectral MNE (sMNE).

Novel cESI theory leading to the sSSBL approximation
Having described the theory of state-of-the-art cESI methods, we now focus on more sophisticated MAP theory and the sSSBL approximation that allows their practical implementation.

Bayesian MAP inverse operators
It is important to stress that posing ad-hoc priors and their hyperparameters is always necessary due to the uncertainties involved, by definition of the MAPs (Eqs. 14, 15) [54], [126]. In route 2, the prior was placed upon the Fourier transform ι f (Eq. 19). An alternative is to leverage the asymptotic distribution of the Fourier transform (Eq. 10) and place the a priori upon ιι f . This approach brings us to our main contribution, the theoretically promising cESI, which leads to an optimal inverse operatorŴ ιv based on the second-type MAP (MAP2) (Eq. 23).
This MAP2 above posits the a priori probability p ιι f upon the source cross-spectrum ιι f , considered a random hyper-parameter matrix. Incorporating an a priori to match the likelihood upon the sampled estimator vv f leads to a posteriori q ιι f vv f . Such a likelihood is the Complex Wishart probability density W C (Eq. 11) now defined for vv f (Eq. 24) with scale matrix vv f and M degrees of freedom. The specific scale matrix vv f in this likelihood posits the relation to the matrix equation of source cross-spectrum ιι f (Eq. 12). An a priori probability is then upon some cross-spectral Gibbs energy P ιι f defined by norms such as the vectorized (entry-wise) pnorm (Ding et al., 2006) and Schatten p-norm (Fan, 1951;Schatten, 2013) upon the matrix ιι f .
Depending on the likelihood and a priori probability (Eq. 24), the optimal inverse operatorŴ ιv (Eq. 23) is often nonlinear and numerically intractable. In this case, which we followed in this article, optimizingŴ ιv requires Approximated Bayesian Computation (ABC) (Csilléry et al., 2010). The ABC we describe here employedŴ (k + 1) ιv , a non-convex but numerically tractable successive approximation toŴ ιv . In turn, the tractableŴ (k + 1) ιv followed from q (k) ιι f vv f , the non-convex relaxation of the a posteriori q ιι f vv f in Expectation-Maximization (EM) iterations. Obtaining the a posteriori relaxation is via the Variational Bayes (VB) treatment (Dempster et al., 1977;Liu and Rubin, 1994;Daunizeau and Friston, 2007;Friston et al., 2007;Nummenmaa et al., 2007;. Such a VB treatment could target the separable model for q ιι f vv f , or the separable Hierarchical Bayesian (HB) model for the a priori p ιι f .
We introduce this joint-MAP (Eq. 25) substituting in the general MAP definition (Eqs. 14, 15) the data Y and source variable X by the instances Y = v m f ; ∀ m and X = ι m f ; ∀ m .
In our case, we considered the above p v m f ; ∀m ι m f ; ∀m the factorizable joint likelihood (Eq. 26) whose factors are present throughout the complex-valued Gaussian probability density (likelihood for route 2) (Eq. 18). In addition, one may assume ξξ = diag β f a univariate (in many cases homogenous) noise model as expressed in terms of the noise spectrum β f .
We now introduce the joint a priori p ι m f ; ∀ m (Eq. 27) upon the Gibbs energy H ι m f ; ∀ m redefining the previous (Eq. 15) over instances ι m f ; ∀ m . Toward cESI, this function may be read as cross-spectral Gibbs energy H ιι f .
Our purpose now is to define the type of cross-spectral Gibbs energy that must deal with a severely ill-conditioned and high dimensional cESI setup (G ≫ S). We shall deal with these problems employing the vector or matrix norms, such as the vector structured p, q-norm (Kowalski and Torrésani, 2009a,b;Gramfort et al., 2012Gramfort et al., , 2013 and the Schatten matrix p-norm (Fan, 1951;Schatten, 2013).
To recap, our approach toward cESI is then with the joint-MAP inverse operatorŴ ιv (Eqs. 25-27) [instead of the MAP2 inverse operator (Eqs. 23, 24)]. Toward our target (cESI), we must leverage the class of joint a priori probabilities expressed by the cross-spectral Gibbs energy H ιι f , not the more general case of Gibbs energy defined upon the instances H ι m f ; ∀ m .
We leverage H 1 (Eq. 28) the structured p = 1, q = 2-norm (square) that performs sparse regularization of the cross-spectrum topographic projection or spectrum tr 1 2 ιι f . H 1 is the wellknown Least Absolute Shrinkage and Selection Operator (LASSO) (Tibshirani, 1996) for vectors ι f extended or structured over the second dimension index m for the set of instances ι m f ; ∀ m . Such a structured norm is a matrix quasinorm that does not fulfill the triangle equality over the cross-spectrum ιι f .
In addition, we introduce H 2 (Eq. 29) the structured p = 2, q = 2-norm (square), Shatten p = 1-norm or nuclear norm tr ιι f , to compensate sparse bias of H 1 (Eq. 28) and regularize eigenvalues for the cross-spectrum ιι f . H 2 is the well-known Ridge operator (Hoerl and Kennard, 1970) for a vector ι f structured over the second dimension index m for the set of instances ι m f ; ∀ m .
ESI practice, which is based on either the sparse or the smooth models, may be insufficient, though, in many scenarios where brain activity is patch-wise or not wholly sparse, Elastic Net, the linear combination of sparse/smooth models, may be ideal (Zou and Hastie, 2005;Vega-Hernández et al., 2008). Such a regularization style is a spatially structured sparsity due to the linear combination of the sparse LASSO and smooth Ridge operators upon the vector ι f . Hence, for the vector instances ι m f ; ∀ m , the linear combination of the quasinorm H 1 (Eq. 28) and the nuclear norm H 2 (Eq. 29) leads to the following joint a priori probability p ι m f ; ∀ m (Eq. 30).
Motivated by ESI practice, our definition of the Elastic Net (Eq. 30) (nuclear quasinorm) diverges from that of previous studies. From these works, the Elastic Net nuclear norm combines the nuclear norm tr ιι f with the square Schatten p = 2-norm (Frobenius norm) tr 2 ιι f (Sun and Zhang, 2012;Chen et al., 2013;Kim et al., 2015;Zhang et al., 2017). This Elastic Net nuclear norm resolves convexity problems of the nuclear norm in the context of matrix completion (Candes and Recht, 2012; Hu et al., 2012) due to a non-convex problem with the sole nuclear norm. Although it was not our purpose to investigate convexity here, this property holds for our Elastic Net nuclear quasinorm, assuming cross-spectrum ιι f upon vector basis (Zou and Hastie, 2005). In addition, we did not consider the vectorized (entry-wise) p-norm (Ding et al., 2006). The latter, which might be necessary to regularize offdiagonal entries in some cases (Paz-Linares et al., 2018), failed to ameliorate ill-condition or distortions.
The type of ABC introduced here is known as "gamma-MAP, " the standard VB treatment applied to a priori probabilities in the joint-MAP. In turn, the gamma-MAP leads to the quasilinear successive approximationsT (k) ιv . Solving the gamma-MAP under cross-spectral Gibbs energy H ιι f defined by the Elastic Net nuclear quasinorm is the Structured Sparse Bayesian learning algorithm (ssSBL) described in the next section.

Gamma-MAP and implementation of SSBL
Similarly to the MAP2 (Eq. 23), the joint-MAP inverse operator W ιv (Eq. 25) could be non-linear by nature, depending on the joint a priori probability p ι m f ; ∀ m . We now introduce the gamma-MAP that achieves the quasilinear joint-MAP version leveraging the idea of mean-field approximation (Kadanoff, 2009). Such a mean-field approximation is the main idea behind the . /fnins. .
VB treatment applied to the latent variables of neuroimaging data (MacKay, 1998;Roweis and Ghahramani, 1999;Beal, 2000, 2001;Eyink et al., 2004;Friston et al., 2007;Nummenmaa et al., 2007;Babacan et al., 2009). By assuming the definition for this joint a priori p ι m f ; ∀ m upon Gibbs energy H ι m f ; ∀ m (Eq. 27), the mean-field approximation is then to search for the selfconsistent energy formH ι m f ; ∀ m (Eq. 31) as represented by separable (in most cases also indistinguishable) energy terms H g ι m g , f ; ∀ m for each source. A self-consistent energy form H g ι m g , f ; ∀ m also summarizes the interaction field g ↔ ∀g ′ , as a property of the original Gibbs energy Obtaining the self-consistent form Eq. (31) was always plausible under the Gibbs energy H ι m f ; ∀ m that fulfills pair-wise separability in identical functions H ι m g , f , ι m g ′ , f ; ∀ m (Eq. 32). Note that this property is known for ensuring the convergence of solutions for the self-consistency field equations in the literature of magnetism (Weiss, 1907(Weiss, , 2001Le Boudec et al., 2007;Kadanoff, 2009;Zheng et al., 2015). The cross-spectral Gibbs energy H ιι f , as defined by the norms considered here (Elastic nuclear quasinorm (Eq. 30) or others), fulfills such a property.
Then, if the joint a priori probability p ι m f ; ∀ m (Eq. 27) is written in terms of the separable Gibbs energy H ιι f (Eq. 32) it turns out the Markov random field property (Eq. 33) that is summarized by p ι m g , f ; ∀m ι m g ′ , f ; ∀m the pair-wise and identical probability factors (Kindermann et al., 1980;Lafferty et al., 2001). Note that such probability factors may approximate but do not strictly represent conditional probabilities.
Hence, we target VB-the factorizable (separable) approximation h q ι m f ; ∀ m (Eq. 34) of the joint a priori p ι m f ; ∀ m (Eq. 33). Essential to obtain h q g ι m g , f ; ∀ m is the Hierarchical Bayes (HB) or probability mixture bellow, identical a priori h and a posteriori q probabilities that are upon some type of "variational" hyper-parameters γ g , f . Such an a posteriori q is also known as the proxy for belief propagation or message passing, denominated iterated conditional mode in the general literature of Markov random fields (Pearl, 1988(Pearl, , 2022Weiss, 2001;Yedidia et al., 2003;Zheng et al., 2015).
Where the hyper-parameters γ g , f condense field interactions H ι g , f ι † g ′ , f ; M (Eq. 33) by the definition of an a posteriori probability q upon ι m f ; ∀ m . After specifying an a priori h, the VB treatment is applied to search for estimators of the a posterioriq (Eq. 35). That is to minimize the Kullback-Leibler divergence D KL p , h q of the approximation h q regarding to the joint a priori p (Eq. 34).
Minimizing the D KL p , h q , although theoretically achievable, turned out to be a difficult variational calculus problem for nonparametric HB (mixture) models (Blei and Jordan, 2006;Bryant and Sudderth, 2012;Nguyen and Bonilla, 2013;Duvenaud et al., 2016). Thus, the D KL approach is common with parametric solutions constraining models (a priori h and a posteriori q) to the exponential family of probabilities (Andersen, 1970;Casella and Berger, 2021).
A simplifying assumption that could bypass the Kullback-Leibler divergence D KL p , h q (Eq. 35) is to regard the a priori h and a posteriori q (34) within a family of conjugate HB models of the likelihood. Furthermore, we considered a priori h factorizable over the set of instances since the pair-wise interactions are additive by definition of ι g , f ι † g ′ , f ; M the cross-spectrum entries in Eq. (7). Henceforth, a complex-valued Gaussian probability N C defines the a priori h ι m f γ f , and a probability in the Gamma family Ŵ defines the a posteriori q γ f ι m f ; ∀m which we incorporate into the following HB model (Eq. 36).
The previous HB model is the Generalized Gaussian Scale Mixture Model (GGSMM) prescribed by the Andrews and Mallows lemma (Andrews, 1974), where the particular and "Gamma" probability Ŵ depends on joint a priori to be approximated via D KL p , h q (Eq. 35) and falls within the class of scale mixture Gamma models (McLachlan and Basford, 1988;Lindsay, 1995;Böhning and Seidel, 2003;Hancock and Samuelsen, 2007).
In such a mixture model, the hyper-parameter vector γ f may then be interpreted as the "variational spectrum, " which specifies a Complex Gaussian a priori probability N C (Eq. 36). The a posteriori for the variational spectrum entries γ g , f is a form of probability that belongs to the Gamma Ŵ family, with a parameterization δ g , ι m f ; ∀ m that condenses field These iterated MAP1s are for an equivalent a posteriori q (k) ι m f v m f (Eq. 38), specified by the mean T (k) ιv f v f and the covariance (k) ιι f , which are functions of the variational spectrum diag γ (k) f and the noise spectrum diag β (k) f . These spectrums specified the type of univariate approximations for the source cross-spectrum ιι , and the noise cross-spectrum ξξ in the quasilinear MAP1 of the Fourier transform (Eq. 20). Henceforth, we place the emphasis in the variational spectrum iterations γ (k) f and defer this noise spectrum β (k) f which is not essential for our main exposition.
Targeting the variational vector γ f was under the marginal or hyper-parametrized likelihood for data p v m f γ f (Eq. 39). This likelihood was due to integration (expectation) of p v m f ι m f under the a priori probability for parameters h ι m f γ f , which is upon the variational hyper-parameters (spectrum) γ f (Eq. 30) (MacKay, 1999).
However, the actual marginal likelihood p v m f γ f (Eq. 39) was intractable, with approximations via the expectation of the log joint probability p v m f , ι m f γ f (Dempster et al., 1977;Liu and Rubin, 1994) under the iterated a posteriori q (k) ι m f v m f (Eq. 40). Then an approximated marginal An inverse operator W ιv of the variational spectrum γ f is theoretically equivalent to the previous joint-MAP (Eq. 25), which is inverse operator of the samples ι m f ; ∀ m . This is known from the literature as "gamma-MAP" (Hsiao et al., 1998(Hsiao et al., , 2002Wipf and Nagarajan, 2009 Here we implemented the Bayesian learning algorithm that effectuates the gamma-MAP γ (k + 1) f (41) and computes the quasilinearT (k) ιv (Eq. 38), for a joint a priori based on the Elastic Net nuclear quasinorm (Eq. 30). For this joint a priori, the solution to the HB (mixture) model (Eq. 36) is exact, following an extension of the Andrews and Mallows lemma to the structured sparsity norms which are here upon the spectrum σ 2 ιι f . Henceforth, we refer to this algorithm as Spectral Structured Sparse Bayesian Learning (ssSBL). The full derivation of the gamma-MAP and the ssSBL algorithm are developed in Supplementary material (Section Standard cESI theory).
Comparison of the distortions produced by seLORETA, sLCMV, and sSSBL Data used for the validation We used two different sets of EEG data to calculate estimated CST, each with their corresponding gold standard CST: (1) Realistically simulated low-density EEG obtained considering as sources a CST obtained from a MEG recording with a very high sensor density (simulated-EEGvsMEG). The gold standard here is the sMNE CST of the MEG recording due to the well-known advantages of MEG over EEG and the very high sensor density.
(2) Real low-density macaque EEG (EEGvsECoG). The gold standard here is the sMNE CST of the simultaneously recorded EcoG.
Simulated-EEG vs. MEG Figure 3 was based on a high-quality resting state MEG recording for subject 175,237 from the HCP database. The MEG signal selected for this purpose (Figure 3a2) was the 246-channel preprocessed data file. The electrical and magnetic lead fields were calculated with the subject's cortical and head geometry. With the magnetic lead field, Spectral MNE (sMNE) was used to calculate the MEG sources, which were taken as sources for the EEG. These sources were passed through the electrical Lead Field (Figure 3a3) to simulate a low-density 19-channel EEG recording (Figure 3a1). The simulation design is standard, essentially the same as those based on more straightforward configurations, where dipoles or patches are taken as the "ground truth" (Haufe and Ewald, 2019). Here, we used a much .

FIGURE
Simulated-EEG vs. MEG experiment that illustrates our validation strategy. We show distortion measures of the estimated Cortical Spectral Topographies (CST) resting-state MEG alpha activity. The same strategy is followed for all spectral bands. It is also applied to the study of distortion based on the EEG vs. ECoG experiment. We start from the cross-spectrum data shown in hot colormaps of D space topographies ( , based on the reference method sMNE. Employing these inverse operators, we obtain estimators for (b ) σ EEG M the EEG tested CST given by method "M" and (b ) σ the MEG (ECoG) sMNE reference CST. Incongruence between (b ) the EEG tested CST and (b ) the MEG (ECoG) reference CST is measured through (b ) e M the Earth Movers' Distance (EMD) and c M the correlation distance ( -CORR). Using (a ) the EEG cross-spectrum data and (b ) Lead Field for EEG, we obtain the resolution operator R M . Leakage in (b ) the EEG tested CST, which is based in (c ) the st quartile point of (b ) the MEG (ECoG) reference CST, is measured through (c ) r M , the Generalized Point Spread Function (GPSF) and b M the Blurring for the GPSF. more realistic set of sources, determined from the MEG. Code availability: https://github.com/CCC-members/MEGvsSimulated-EEG.  acquired at 19 sensors simultaneously with the ECoG in the resting state (Nagasaka et al., 2011). This macaque preparation allowed the realistic measurement of distortions in resting-state connectivity estimated from low-density EEG using different aESI solutions (Wang et al., 2019). A sensor array placed surgically on the left macaque's cortical surface (Figure 4a1) allowed dense recordings of . /fnins. . Figure 4a2). The lead fields for EEG and ECoG (Figure 4a3) were obtained using the macaque's cortical and head geometry. The ECoG lead field was used to generate a reference or ground truth CST using sMNE. It was unnecessary to simulate data since the concurrent low-density EEG was available. Code availability: https://github.com/CCC-members/ECoGvsEEG.

Validation rationale
We evaluated the low-density EEG recordings using ssSBL, sELORETA, and sLCMV to produce the CSTs σ sSSBL , σ sELOR , and σ sLCMV , respectively. We denote these CST generically as σ M , with Mspecifying the method. These calculations were carried out for the previously mentioned experiments: simulated EEG vs. MEG and EEG vs. ECOG. These CSTs were compared with reference CSTs (σ 0 ) considered the "gold standards." This reference was varied according to the type of measurement and will be detailed in the context below. Finally, each method's distortion of its reference was assessed using the measures described next.

Measures of distortion
Measures of distortion to compare σ M with σ 0 fall into two groups, leakage and incongruence indices that we illustrate in the Simulated-EEG vs. MEG experiment (Figure 3). The measures of the macaque EEG vs. ECoG experiment in figure elements (Figures 4b1-b3, c1, c2) are analogous to those of the Simulated-EEG vs. MEG experiment.
Leakage (Palva et al., 2018;Van de Steen et al., 2019), or spread, is quantified using the Generalized Point Spread Function and a blurring measure. These are based on the concept of the resolution operator R M = T ιv,M L vι . Here T ιv,M is the inverse operator for the method M, and L vι the lead field. Consider a point source u o indexed by g 0 , any column R M :, g 0 is then the voltages v 0 produced by this source.
• The Generalized Point Spread Function (Grova et al., 2006b;Haufe et al., 2008) (GPSF) r M in Figure 3c1, depicted with a hot colormap, represents the leakage (spread) of the lowdensity EEG CST estimators for a given set of cortical points. These cortical points, shown as blue dots (Figure 3c2), are selected as the most active 25% of the reference σ 0 and are the set G 0 . The GPSF, for any point g ∈ G in the set of cortical points, is then: Where |G 0 | denotes the number of elements in that set.
• The blurring measure for images (BLUR) b M is defined as the Spatial Dispersion (SD) of the GPSF (Grova et al., 2006a;Haufe et al., 2008). It is worth clarifying that for a perfect cESI solution, with zero b M = 0 in Figure 3c1, there would be an exact coincidence between the GPSF r M non-zero values in the colormap and the blue dots in Figure 3c2. Formally, b M is defined as: Where ϑ 2 M g 0 is the spatial dispersion around the reference point g 0 and Where d 2 gg 0 is the square of the geodesic distance between those points.
(2) The correlation distance (1-CORR) c M which measures deformations from the expected collinearity between pairs of spatially distributed CSTs.
These incongruence measures (EMD) e M correlation distance (1-CORR) c M combine sensitivity to leakage and localization error.

Simulated-EEG vs. MEG inverse solutions
Both sELORETA and sLCMV CST estimators were seriously affected by leakage ( Figure 5), judging by the mismatch between their estimated GPSF r sELOR and r sLCMV compared to the reference points (blue dots). They were centered at incorrect sites (opposite the blue dots) and with a much larger spread. For ssSBL, the set of local maxima for r ssSBL was correctly centered around reference points and did not extend significantly beyond these. In other words, as can be appreciated qualitatively, r ssSBL minimized leakage compared to sELORETA and LCMV.
Initially, these results differed from those reported by other authors for r sELOR and r sLCMV (Haufe and Ewald, 2019). The explanation may be due to their using idealized EEG simulations (point generators or discrete patches) that generate fewer distortions, thus reducing the apparent differences in the performance of different aESI methods. We verified this in recent aESI studies (Paz-Linares et al., 2017), concluding that simple SSBL (not the cESI ssSBL implemented in this paper) outperforms eLORETA and other methods only by a narrow margin.
An aESI solution computed with LCMV was usually sparser than an ELORETA solution, which was expected due to the thresholding strategy implemented in the original LCMV (not sLCMV) (Van Veen et al., 1997). However, this did not lead to any improvement in terms of the leakage observed in the GPSFs r sELOR and r sLCMV . The "zero localization error" of eLORETA has .
/fnins. . been claimed to be the unique advantage in its favor (Pascual-Marqui et al., 2006). However, this property has been theoretically demonstrated only for the peak (maximum) activity and not for the numerous local maxima of secondary activations common in real-life scenarios. Therefore, unsurprisingly, eLORETA can produce much better results in idealized simulations that use activity modeled as a focalized concentrated unimodal distribution (e.g., Gaussian) or only a few simple point sources (Kobayashi et al., 2003;Grova et al., 2008;Schoffelen and Gross, 2009;Haufe et al., 2013;Burle et al., 2015;Colclough et al., 2015;Bradley et al., 2016;Mahjoory et al., 2017;Stokes and Purdon, 2017;Palva et al., 2018;Haufe and Ewald, 2019;Marinazzo et al., 2019). A striking observation was that despite σ 0 being highly frequency dependent, the GPSF pattern was r seLORETA and r sLCMV was relatively invariant for all frequencies across the spectrum. This invariance suggests that the pattern was mainly due to Lead Field properties rather than physiologically (Lopes da Silva et al., 1974;Niedermeyer and da Silva, 2005;Lopes da Silva, 2013). In contrast, the GPSF r ssSBL pattern was closely tied to the physiological fluctuations across frequencies: from activity in the slow delta band to the faster alpha band. These fluctuations were inherited from the reference MEG data.
The quantitative analysis of the leakage measure b M (BLUR) (Figure 6, left-column) confirms our qualitative impressions based on the GPSF r M . The radar graphs showed a decrease in leakage of ssSBL compared to sELORETA and sLCMV (b sELOR > b ssSBL , b sLCMV > b ssSBL ). This improvement was valid for all spectral bands (delta, theta, alpha, gamma 1, and gamma 2).
EMD values e M are intuitively the amount of work to deform the EEG-based CST estimator to the reference CST estimator (σ M → σ 0 ). They were one to two orders larger for sELORETA and sLCMV than SSBL (e sELOR >> e ssSBL , and e sLCMV >> e ssSBL ).
A linear model adjusted to every source of the EEG and MEG CSTs estimated data pairs shows a clear linear tendency (Figure 7), with correlation distances larger for sELORETA and sLCMV. Some correlations are even negative correlations. By contrast, c sELOR was in the range of around 0.7, for all frequency bands. This behavior was congruent with that observed in the colormaps for GPSF (r sELOR and r sLCMV ) ( Figure 5), where the maximum values appear in opposite areas to the blue dots (reference estimation).
These results suggest that commonly used ESI validation procedures, limited to idealized simulations of local neural currents, may not accurately assess the actual distortions (Haufe et al., 2013;Haufe and Ewald, 2019). Our simulation of EEG that incorporates realistic local neural currents, estimated from highdensity MEG, shows that the leakage and localization error in ESI applied to actual data might be much more severe than expected (Palva et al., 2018;He et al., 2019;Van de Steen et al., 2019). Therefore, future efforts should consider validation benchmarks based on realistic simulations like those described here. When using this benchmark, the ssSBL approach appears to considerably control the effect of distortions with respect to other techniques.

EEG vs. ECoG inverse solutions
The GPSF colormaps (Figure 8) for each method showed a consistent behavior as those of the Simulated-EEG vs. MEG experiment ( Figure 5). As evident in the GPSF maps r ssSBL and measured in b ssSBL (Figure 6, right-column), the performance of ssSBL in reducing leakage was superior compared to sELORETA (b sELOR > b ssSBL ) and sLCMV (b sLCMV > b ssSBL ).
The measurements of incongruence by the EMD e M (Figure 6, right-column) were consistent with those of the previous experiment (left-column), confirming the improvement of ssSBL regarding sELORETA (e sELOR > e ssSBL by a considerably narrow margin) and LCMV (e sLCMV >> e ssSBL by a wider margin of three orders of magnitude) for all spectral bands.
The correlation distance c M for all methods showed that linear regression described the relation of EEG CST to ECoG CST well. The correlation was positive for ssSBL but negative for sELORETA and sLCMV as seen in Figure 6 (right-column). As a consequence (c sLCMV > c LORETA> c ssSBL ) for all spectral bands. This linear tendency in Figure 9, similarly to Figure 7 confirms the feasibility of cESI, even with low-density EEG, given its close relationship to a more direct observation modality like ECoG. The results of sELORETA and sLCMV did not reveal a clear linear positive tendency. These results confirm and extend the results of previous studies Papadopoulou et al., 2019;Wang et al., 2019), suggesting that some types of ESI should be interpreted with extreme care.

FIGURE
In radar graphs, measurements of "Leakage" and "Incongruence" in the Cortical Spectral Topographies (CST) were obtained with all tested methods "M" (ssSBL, seLORETA, and sLCMV) from the EEG. Leakage, regarding "Y," the MEG or ECoG sMNE reference CST, is measured employing B M , the Blurring (BLUR), and Incongruence, employing e M the Earth Movers' Distance (EMD), and C M the Correlation Distance ( -CORR), shown for five characteristic spectral bands (delta, theta, alpha, beta, and gamma). Distortions based in the BLUR, EMD, and -CORR are proportional to their values in the radar graph. Calculations of the BLUR, EMD, and -CORR follow the procedure described in Figures , , for the MEGvsSimulated-EEG and ECoGvsEEG experiments.

FIGURE
Linear model and correlations for the Cortical Spectral Topographies (CST) obtained from the MEGvsSimulated-EEG experiment in the human. These were adjusted from the CST data pairs (σ M , σ ) for all tested methods "M" (ssSBL, seLORETA, and sLCMV), and in five characteristic spectral bands (delta, theta, alpha, beta, and gamma).
with flexible smooth/sparse a priori models (Zou and Hastie, 2005;Vega-Hernández et al., 2008;Li and Lin, 2010). Quasilinear inverse solutions of these smooth/sparse models can provide good datadriven approximations. We formalized the three possible routes toward cESI (our target) via MAP1 inverse solutions, for the vector processes or their Fourier transform and via MAP2 or joint-MAP inverse solutions for their cross-spectrum. The cross-spectral MAP2 or joint-MAP are plausible inverse solutions, which target the quantity of interest and theretofore posit the a priori upon the cross-spectrum ιι f .
In contrast, most cESI have been previously limited to route 1 for the processes or route 2 for the Fourier transform .

FIGURE
In hot colormaps, measurements of "Leakage" in the EEG-based Cortical Spectral Topographies (CST) were obtained with all tested methods "M" (ssSBL, seLORETA, and sLCMV). Leakage is measured by employing r M , the Generalized Point Spread Function (GPSF) regarding to the ECoG sMNE reference CST, shown in the cortical views Left (L), Dorsal (D), Posterior (P), and Ventral (V) and for five characteristic spectral bands (delta, theta, alpha, beta, and gamma). Leakage distortions are proportional to mismatch and spread in the GPSF values (hot colormaps) regarding reference points (blue dots). Calculations of the GPSF follow the procedure described in Figure for the ECoGvsEEG experiments.

FIGURE
Linear model and correlations for the Cortical Spectral Topographies (CST) obtained from the ECoGvsEEG experiment in the macaque. These were adjusted from the CST data pairs (σ M , σ M ) for all tested methods "M" (ssSBL, seLORETA, and sLCMV), and in five characteristic spectral bands (delta, theta, alpha, beta, and gamma).
(aESI) wherein the a priori is placed upon ι (t) or ι f . Therefore, essential statistics like cross-spectrum were incorrectly addressed as the subsequent step to aESI. As we demonstrated in simulations and real data, the aESI procedure is not suitable, with cESI amplifying the distortions previously produced during aESI. Indeed, a way to achieve statistical guarantees in cESI is through route 3 associated with the source cross-spectrum for the data. We have deferred for now to the complete implementation of the MAP2 inverse solution for the cross-spectrum. This MAP2, which targets a source matrix, was not straightforward, given a cESI setup that is severely ill-conditioned and of high dimensionality. Hence, we adopted the joint-MAP, as an approximation to the MAP2, which targets ιι f of the sampled source cross-spectrum. An essential concept introducing any MAP inverse solution was the Gibbs energy, a concept first formulated as the Gibbs free energy of any physical system (Landau and Lifshitz, 1980). From the point of view of the inverse problem theory, the Gibbs free energy describes the forward energy exchange of a system (the source variable) to the media (the data) (Ghahramani and Beal, 2001;Grave de Peralta Menendez et al., 2004;Friston et al., 2006;Mattout et al., 2006;Friston K. L. et al., 2008;Wipf and Nagarajan, 2009). Thus, this is an important generalization of the Tikhonov regularization and ESI for source cross-spectrum.
We employed this cross-spectral Gibbs energy to model the joint a priori probabilities and second-order multivariate properties of the Fourier transform. The Gibbs energy in cESI must be a function of the cross-spectrum entries. This assumption follows from the Gaussianity of Fourier transform (Brillinger, 1965;Brillinger and Rosenblatt, 1967), or statistical sufficiency of the cross-spectrum. This assumption is valid for ESI under a variety of experimental conditions, as it can be demonstrated with highquality MEG and ECoG data (Nagasaka et al., 2011;Van Essen et al., 2013).
We indicate the importance of the nuclear norm (trace) and nuclear quasinorm (square root trace) for matrices, from the context of matrix completion inverse problems (Fan, 1951;Ding et al., 2006;Sun and Zhang, 2012;Chen et al., 2013;Schatten, 2013;Kim et al., 2015;Zhang et al., 2017) which translated into a sparse/smooth spectrum model. Per the definition of cross-spectrum upon vector basis, this model indeed could be unified with the structured vector norms (Kowalski and Torrésani, 2009a,b;Gramfort et al., 2012Gramfort et al., , 2013. Furthermore, this has been very common in aESI with sparse (LASSO) (Tibshirani, 1996) and smooth (MNE) (Hoerl and Kennard, 1970) model. Our sparse-smooth model was, therefore, an Elastic Net nuclear quasinorm.
Our approach has an important connection to Bayesian learning (Tipping, 2001;Wipf et al., 2006;Park and Casella, 2008;Wipf and Nagarajan, 2009;Casella et al., 2010;Li and Lin, 2010;Paz-Linares et al., 2017) and provided the link between the joint-MAP and quasilinear inverse solutions. We introduced a type of variational approximation to the joint a priori probability upon cross-spectral Gibbs energy. This variational approximation is similar to the previous Bayesian Learning methods with extended applicability to high dimensional inverse problems that are solved in iterated conditional mode (Pearl, 1988(Pearl, , 2022Weiss, 2001;Yedidia et al., 2003;Zheng et al., 2015).
Here, we restricted ourselves to the Gaussian Mixture Model approach (Blei and Jordan, 2006;Bryant and Sudderth, 2012;Nguyen and Bonilla, 2013;Duvenaud et al., 2016) applied to the specific a priori per the definition of the Elastic Net nuclear quasinorm (Candes and Recht, 2012;Hu et al., 2012). Our Bayesian learning method (ssSBL) was a generalization of the SBL and sSBL approaches.
We demonstrated with several quality measures (Grova et al., 2006b;Haufe et al., 2008;Paz-Linares et al., 2017;Van de Steen et al., 2019;Wang et al., 2019) that cESI estimator distortions in actual data are more larger than expected for some state-of-the-art methods, such as sELORETA and LCMV. Our Simulated-EEG vs. MEG and EEG vs. ECoG validation method benchmarked the cESI distortions in the 10-20 EEG system (19 channels), which is considered the lower bound for all ESI. Therefore, we can infer that the behavior of our methods in EEG or other techniques that provide denser recordings can only improve.
The human Simulated-EEG vs. MEG validation method produces measurements of the cESI estimator distortions very similar to those of the EEG vs. ECoG simultaneously recorded in the macaque (Nagasaka et al., 2011;Wang et al., 2019). This similarity strongly supports the possibility of effectively measuring the distortions expected in many real-life scenarios with our Simulated-EEG vs. MEG design, which is easily replicable for other MEG acquisitions or different experimental conditions (not only resting state).
Our results indicate that ssSBL produces less cESI distortions than sELORETA and sLCMV, according to all leakage measures computed for the Simulated-EEG vs. MEG. These results were supported by cESI obtained from low-density EEG recordings in macaque compared to more fine-grained cESI obtained from simultaneously recorded high-density ECoG.

Conclusion
In this manuscript, we leveraged Bayesian theory to investigate the benefits and shortcomings of facing cESI with a large family of methods. We show that cESI must be taken care of in severely ill-conditioned and high-dimensional settings such as the ones dealt with here. In such settings, achieving cESI via exact Bayesian MAP2 methods is unfeasible and requires approximations. We have introduced quasilinearity, a reasonable cESI assumption, leading to the joint MAP that is feasible via the variational approximation to MAP. Our implementation, ssSBL specifies a priori to diminish CST distortions and exhibits good properties, outperforming state-of-the-art methods. The Bayesian theory and methods presented here could potentially be applied to signal processing and imaging other biological phenomena described by the cross-spectrum.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author/s.

Author contributions
DP-L and EG-M contributed in developing the theory, mathematical demonstrations, design of methods, implementation, and paper preparation. AA-G, ML, and YW prepared part of the data, code, and results used in this work. MV-H, QW, and EM-M prepared the macaque data for experimental confirmation and contributed to the theory and methods. MB-V, JB-B, and MV-S as senior authors analyzed the results. PV-S as senior author, introduced the theoretical background, CNS program of and guided this work. All authors contributed to the article and approved the submitted version.