A hierarchical Bayesian inference model for volatile multivariate exponentially distributed signals

Zhu, Changbo; Zhou, Ke; Tang, Fengzhen; Tang, Yandong; Li, Xiaoli; Si, Bailu

doi:10.3389/fncom.2025.1408836

ORIGINAL RESEARCH article

Front. Comput. Neurosci., 12 November 2025

Volume 19 - 2025 | https://doi.org/10.3389/fncom.2025.1408836

A hierarchical Bayesian inference model for volatile multivariate exponentially distributed signals

Changbo Zhu^1,2,3

Ke Zhou⁴

Fengzhen Tang^1,2,3

Yandong Tang^1,2,3

Xiaoli Li⁵

Bailu Si^6,7^*

¹State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China
²Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang, China
³University of Chinese Academy of Sciences, Beijing, China
⁴Beijing Key Laboratory of Applied Experimental Psychology, School of Psychology, Beijing Normal University, Beijing, China
⁵State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, China
⁶School of Systems Science, Beijing Normal University, Beijing, China
⁷Chinese Institute for Brain Research, Beijing, China

Brain activities often follow an exponential family of distributions. The exponential distribution is the maximum entropy distribution of continuous random variables in the presence of a mean. The memoryless and peakless properties of an exponential distribution impose difficulties for data analysis methods. To estimate the rate parameter of multivariate exponential distribution from a time series of sensory inputs (i.e., observations), we constructed a hierarchical Bayesian inference model based on a variant of general hierarchical Brownian filter (GHBF). To account for the complex interactions among multivariate exponential random variables, the model estimates the second-order interaction of the rate intensity parameter in logarithmic space. Using variational Bayesian scheme, a family of closed-form and analytical update equations are introduced. These update equations also constitute a complete predictive coding framework. The simulation study shows that our model has the ability to evaluate the time-varying rate parameters and the underlying correlation structure of volatile multivariate exponentially distributed signals. The proposed hierarchical Bayesian inference model is of practical utility in analyzing high-dimensional neural activities.

1 Introduction

Decoding of the states of neural systems is a critical task for many applications in neural engineering, ranging from cognitive assessment, brain–machine interface to deep brain stimulation (Haynes and Rees, 2006; Qi et al., 2019; Yousefi et al., 2019; Xu et al., 2021; Zhang et al., 2022; Pan et al., 2022; Li and Le, 2017). However, there are several critical challenges faced by mental state decoding methods. First, brain activities are highly non-stationary, often showing transient dynamics. Second, responses of different brain regions are correlated, due to the dense complex anatomical connectivity patterns. Third, imaging processes of brain activities imposed additional spatial temporal transformations on neural signals, calling for appropriate inference methods to uncover the underlying brain states. To tackle these difficulties, methods that are capable of tacking and inferring multi-dimensional dynamic brain signals are indispensable.

Brain activities are shown to follow particular types of distributions that are distinctive from Gaussian distributions (Roxin et al., 2011). Extracellular recordings of brain voltage signals of various brain regions from different animals could be described by an exponential family of distributions, with tails falling off according to exponential distributions (Swindale et al., 2021). The distributions of the electromyography and electroencephalography signals from human subjects are found to have fatter tails than that of a Gaussian distribution and are fitted well by a generalized extreme value distribution (Nazmi et al., 2015). The innate statistics of the measured neural activities lead the direct application of classic tracking and inference methods, such as Kalman filtering, to be suboptimal (Li et al., 2009; Malik et al., 2010). It is therefore a valuable research direction to develop inference methods that closely match the characteristics of brain activities.

Exponential distributions well describe empirical data in neuroscience. Neurons in many regions, such as middle temporal and medial superior temporal visual areas in monkeys, fire in a Poisson-like fashion, with exponential distributed interspike intervals (Maimon and Assad, 2009; Ouyang et al., 2023). The sleep episode durations of human and other mammals, such as cats and rats, follow exponential distributions (Lo et al., 2004). The locomotion activity of cells in vitro displays a universal exponential distribution (Czirók et al., 1998). In addition, exponential distribution provides a good description of waiting times in the physical world, including lifespans, counts within a finite time period and so on. Therefore, researchers employ exponential distribution as lifetime distribution model to describe the lifetimes of manufactured products (Davis, 1952; Epstein and Sobel, 1953; Varde, 1969) and the survival or remission times in chronic diseases (Shanker et al., 2015). In physics, an exponential distribution is the best model of the times between successive flaps of a flag for a variety of wind speeds (McCaslin and Broussard, 2007). In finance, accumulating evidences have suggested that financial data can be quantified by exponential distributions. A study of tax and census data shows an exponential distribution of individual income in the United States (Drăgulescu and Yakovenko, 2001). An exponential distribution also agrees well with income for families with two earners (Drăgulescu and Yakovenko, 2001).

In this article, we aim to develop an inference model particularly to deal with the problem of volatility and multi-dimensionality in data space. Importantly, we assume that the data follow a multivariate exponential distribution, capturing the fat tail characteristics of neural signals. The proposed model can be applied to state estimation tasks in psychophysics, brain activity analysis, as well as other non-linear time series modeling tasks.

In probability theory, exponential distribution is a maximum entropy distribution of a continuous random variable with a bounded mean (Jaynes, 1982; Conrad, 2004; Stein et al., 2015). The exponential distribution has several interesting and important properties (Johnson et al., 2002; Ibe, 2014; Marshall and Olkin, 1967b):

• An exponential distribution is governed by a rate parameter (interpreted as the inverse of average waiting time). The mean of an exponential random variable is equal to the standard deviation (std).

• Exponential distribution is peakless. The probability density function of an exponential distribution is monotonously decreasing. The expectation of an exponential random variable is not at the maximum point of its probability density function. This means that samples drawn from an exponential distribution contain high noise, resulting in a fat tail.

• An exponential random variable is memoryless, i.e.,

\begin{array}{l} P (x > t + ϵ ∣ x > ϵ) = P (x > t), \forall t, ϵ > 0 . \end{array}

In a Poisson process, this memoryless property means that the probability of waiting time until the next event is not affected by start time (Kingman, 1992). All waiting times are independently identically distribution (iid).

Due to these characteristics, fitting models of multivariate exponential distribution is a difficult problem encountered in various disciplines. The Marshall–Olkin exponential distribution is introduced based on shock models and the constraint that residual life and age are independent (Marshall and Olkin, 1967a). An exponential distribution with exponential minimums provides a model to describe the reliability of a coherent system (Esary and Marshall, 1974). A bivariate generalized exponential distribution is also introduced to analyze lifetime data in two dimensions (Kundu and Gupta, 2009). However, these models are complex in form and are not robust for non-stationary data. More importantly, the interactions among the components of a multivariate exponential variable are not trivial to estimate. These classical studies took the assumption of static distributions, without considering the dynamic changes of the underlying distributions. Robust methods for the estimation of multivariate exponential distribution in volatile environments are still sparse.

“Observing the observer” is a meta Bayesian framework (Daunizeau et al., 2010b,a) and furnishes a unified programming and modeling framework that unites perception and action based on the variational free energy principle (Beal, 2003; Friston, 2010; Mathys et al., 2011; Friston et al., 2017). Perceptual and response models are two major parts of this framework. Inversion of the perceptual and response models can map from sensory inputs (i.e., observations) into response actions. Following this framework, the general hierarchical Brownian filter (GHBF) was proposed as a model for state estimate in dynamic multi-dimensional environments with Gaussian distribution assumption (Zhu et al., 2025). An important function of this model is to capture temporal dynamics of lower order interactions among sensory inputs (i.e., observations).

In this article, we extend the general hierarchical Brownian filter to non-Gaussian case and develop an inference model for volatile multivariate exponentially distributed signals. The inference model incorporates a hierarchical perceptual model and a response model into the “observing the observer” framework. The model receives a series of multidimensional sensory inputs or observations and is asked to infer rate parameter of a multivariate exponential distribution in a complex volatile environment. The perceptual model represents rate parameter and covariance of the logarithm of rate parameter. The response model is a stochastic mapping to reproduce a series of sensory inputs. Compared with previous hierarchical Bayesian methods (Beal, 2003; Friston, 2010; Mathys et al., 2011; Friston et al., 2017), the proposed model is able to deal with multidimensional signals and dynamically uncover the potential correlation structure in the data.

The contribution of this article is two-fold. First, we develop a hierarchical Bayesian model to estimate the parameters of multivariate exponential distributions which are subject to dynamic changes. Through variational Bayesian learning, the model infers the rate parameters and the pairwise correlations of multivariate exponentially distributed signals at the same time; therefore, it is able to robustly track the distribution dynamically. The proposed model is valuable for its potential applications in estimating neural and behavioral responses. Second, the efficiency and the robustness of the proposed inference model is tested in simulations with synthetic dynamic data. Compared with a simplified model of constant volatility parameters, the proposed model is better in explaining the data, demonstrating the importance role of higher order variables, such as correlations, in estimating the parameters of the signal.

The rest of this article is structured as follows. The mathematical notations used in this study is defined in Section 2. Section 3 introduces the hierarchical Bayesian perceptual model in multivariate exponential distribution environment. Section 4 derives a set of closed form update equations for perceptual inference. Simulations results are given in Section 6. Finally, the article is concluded after discussions.

2 Notations

Throughout this article, we use the following conventional mathematical notations:

• A bold capital letter is a matrix while a bold lowercase letter is a vector.

• A hollow capital letter denotes a set, which is also denoted by {}.

• A probability density function (PDF) is denoted by q(·) or p(·).

• A multivariate Gaussian PDF of x is denoted by $N (x; μ, Σ)$ with mean μ and variance Σ, while a multivariate Gaussian random vector is denoted by $x ~ N (μ, Σ)$ .

• An multivariate exponential PDF of x can be denoted by $E (x; r)$ with a rate parameter r, while an multivariate exponential random vector is denoted by $x ~ E (r)$ .

• A sequence of variables over time are denoted by “:,” for example,

o_{1 : K} = o (t_{1}), o (t_{2}), \dots, o (t_{K}) .

• E_q(x)(v) means the expectation of v under the distribution q(x).

• The operator ⊙ is the Hadamard product, the operation diag(v) is to transform a vector v into a diagonal square matrix with the elements of v on the principal diagonal.

• The function vec(M_m×n) is the vectorization of a matrix M, a linear operation, to obtain a column vector of length m×n by concatenating the columns of the matrix M consecutively from column 1 to column n. The operator ⊗ is the Kronecker product.

• The function lvec(L) is to transform a lower triangular matrix L into a column vector lvec(L) obtained by stacking columns without zero elements in the upper triangle part of the matrix.

3 Hierarchical Bayesian perceptual model

3.1 Parameterization of multivariate exponential distribution

Given a random multivariate exponential variable x₀ without cross dimension interactions among components, we can easily get the joint probability of all components by directly multiplying all marginal exponential distributions:

\begin{array}{l} E (x_{0}; r_{0}) = \prod_{i = 1}^{d_{0}} r_{0}^{(i)} exp (- r_{0}^{(i)} x_{0}^{(i)}) = exp (- r_{0}^{T} x_{0}) \prod_{i = 1}^{d_{0}} r_{0}^{(i)}, & (1) \end{array}

where $x_{0}^{(i)}$ is the i-th component (i.e., random exponential variable) of x₀. The rate parameter $r_{0}^{(i)}$ is the expectation of the i-th random exponential variable $x_{0}^{(i)}$ . r₀ is the expected rate vector of random vector x₀. The integer d₀ is the number of dimensions of the random vector x₀. However, this independent model is incapable of capturing the pairwise probabilistic correlation among the components of x₀. If we introduce non-independent exponential model with interactions among the components of x₀, it will lead to high model complexity. Since the rate parameter r₀ is of primary interest, we aim to learn the rate parameter by explicitly considering the pairwise interactions among the components of r₀. To keep the positive constraint of the rate parameter, we convert the constrained learning problem into an unconstrained learning in logarithmic space. More specifically, the rate r₀ is mapped from a point x₁ in its log-space

\begin{array}{l} r_{0} (t) = exp (W_{1} x_{1} (t) + b_{1}), & (2) \end{array}

where the notation exp(·) denotes the element-wise exponential function. The coefficient matrix W₁ is a diagonal matrix with positive elements on the principal diagonal. This matrix represents the coupling strength between x₀ and x₁. The bias b₁ is a shift parameter.

3.2 Perceiving tendency and volatility of the rate parameter

Volatile signals fluctuate over time, showing variations. The fluctuations of the signals are again subject to changes, and so forth. The nested nature of volatility is a hallmark of collective phenomena as observed in many complex systems like brain network, animal swarm and financial market. To quantitatively describe volatility and pairwise correlations of multi-dimensional signals, general hierarchical volatility model could be constructed based on nested Brownian motions (Zhu et al., 2025). The basic idea is that the variable of interest is represented by a Brownian motion, while the changes of the variable is predicted by higher order variables that are again subject to Brownian motions. Following this framework, we develop a hierarchical perceptual model to estimate the tendency and volatility of multivariate exponentially distributed signals (Figure 1). More specifically, the logarithms of rate parameters x₁ of the underlying multivariate exponential distribution is modeled by a general Brownian motion with diffusion matrix $Σ_{1} \in ℝ^{d_{1} \times d_{1}}$

\begin{array}{l} x_{1} = B (t; Σ_{1}) . & (3) \end{array}

Figure 1

Hierarchical generative model diagram showing layers and connections. Lambda at the top layer leads to L2, connected to x2(t) with parameters B(t, Σ2). W2 and b2 influence F2, which connects to L1 and x1(t) with B(t, Σ1). Final expression E(x1(t); exp(W1x1,k + b1)) leads to output o(t).

Figure 1. Overview of the hierarchical perceptual model.

This Brownian motion captures the tendency of the learned parameter vector x₁. The volatility (i.e., uncertainties and pairwise correlations) in x₁ is given by $Σ_{1} \in ℝ^{d_{1} \times d_{1}}$ , which is a symmetric positive definite matrix by definition. Considering the fact that the diffusion matrix Σ₁ is a symmetric positive definite matrix, it could be uniquely represented by a lower triangular matrix $L_{1} \in ℝ^{d_{1} \times d_{1}}$ according to Cholesky decomposition (Tanabe and Sagae, 1992; Jung and O'Leary, 2006):

\begin{array}{l} Σ_{1} = L_{1} {L_{1}}^{T} . \end{array}

To further evaluate the volatility Σ₁ in x₁, we assume that its decomposition L₁ is modeled by a general Brownian motion in its parameterized space. To be exact, the elements of L₁ is parametrized by a d₂ = d₁(d₁+1)/2 dimensional vector y₂, which results from concatenating the lower triangle elements of L₁ in a column-wise fashion. The element in i-th row and j-th column of L₁ is parameterized by

\begin{array}{l} {L_{1}}^{(i, j)} = l_{1}^{(i, j)} = \\ {\begin{array}{l} 2 sinh (y_{2}^{(\frac{(2 d_{1} - j + 2) (j - 1)}{2} + i - j + 1)}), & 1 \leq j < i \leq d_{1} \\ exp (y_{2}^{(\frac{(2 d_{1} - i + 2) (i - 1)}{2} + 1)}), & j = i \end{array} & (4) \end{array}

where sinh(·) denotes the hyperbolic sine function. Note that Equation 4 transforms L₁ into logarithmic space, while conserving non-negativity for diagonal elements and allowing arbitrary values for off-diagonal elopements of L₁.

The vector y₂ represents the volatility of the signal in logarithmic space, therefore constitutes a parameterization of the volatility. y₂ is given by the following mapping in the second level of the model:

\begin{array}{l} y_{2} = W_{2} x_{2} + b_{2}, & (5) \end{array}

where b₂ and $x_{2} \in ℝ^{d_{2}}$ represent the trend and time-varying fluctuation in log-volatility of x₁, respectively. The coefficient matrix W₂ is a d₂-by-d₂ diagonal matrix representing the coupling strength from level two to level one. W₂ can simply take the form of a diagonal matrix spanned from a column vector w₂ with all positive elements

\begin{array}{l} {W_{2}}^{(i, i)} = w_{2}^{(i)} . \end{array}

We can rewrite the coupling (Equations 4, 5) as

\begin{array}{l} L_{1} = F_{2} (x_{2}; w_{2}, b_{2}) . \end{array}

In the second level of the model, we further assume that x₂ evolves as a general Brownian motion with diffusion matrix $Σ_{2} \in ℝ^{d_{2} \times d_{2}}$

\begin{array}{l} x_{2} = B (t; Σ_{2}) . & (6) \end{array}

The diffusion matrix Σ₂ is chosen as a diagonal matrix for simplicity. Let $L_{2} \in ℝ^{d_{2} \times d_{2}}$ be the unique Cholesky decomposition of Σ₂. We simply assume that L₂ is a constant diagonal matrix spanned by vector $λ \in ℝ^{d_{2}}$ with all elements being positive.

Figure 1 shows an overview of the hierarchical perceptual model. With this model, a Bayesian agent receives a series of sensory inputs or observations o_1:T. At time t_k, the sensory input o_k to the agent is determined by a delta distribution δ(·)

\begin{array}{l} P (o_{k} ∣ x_{0, k}) = δ (o_{k} = x_{0, k}) . & (7) \end{array}

The initial priori states p(x_{1, 0}, x_{2, 0}) are Gaussian distributions as follows:

\begin{array}{l} \begin{matrix} q (x_{h, 0}) = & N (x_{h, 0}; μ_{h, 0}, C_{h, 0}), h = 1, 2 . \end{matrix} & (8) \end{array}

In summary, the hierarchical perceptual model constitutes a generative model for sensory observations o(t) based on hidden representations of the tendency (x₁) and the volatility (x₂) of the observations. . To simplify the notations, we introduced the notation X to denote the set of all hidden states, ℙ for the hyperparameters and the prior states of the model:

\begin{array}{l} X = {x_{0}, x_{1}, x_{2}}, \\ ℙ = {w_{1}, b_{1}, w_{2}, b_{2}, λ, μ_{1, 0}, C_{1, 0}, μ_{2, 0}, C_{2, 0}} \end{array}

where μ_{1, 0}, C_{1, 0}, μ_{2, 0}, andC_{2, 0} are the prior states of the model defined in Equation 8 and Supplementary material Section 2.

4 Perceptual inference approximated by variational approximation

The aforementioned hierarchical perceptual model is constructed based on general continuous Brownian motions. It remains to derive update rules to estimate the posterior distributions for the hidden representations x₁ and x₂. In order to derive a family of analytical and efficient update rules, we discretize continuous Brownian motions by applying the Eulerian method. The sampling interval (SI) ϵ_k = t_k−t_k−1 is defined by the time that elapses between the arrival of consecutive sensory inputs o_k−1 and o_k.

We use the variational Bayesian method (Beal, 2003; Friston, 2010; Daunizeau et al., 2010b; Mathys et al., 2011) to reach an approximation to the posterior distributions of x₁(t) and x₂(t) given the sensory input o(t) (i.e., observation). To this end, we maximize the negative free energy, which is the lower bound of log-model evidence, to yield variational approximation posterior (cf. Supplementary material Section 1):

\begin{array}{l} q (x_{h, k}) = \frac{1}{Z_{h}} exp (V_{h} (x_{h, k})), h = 1, 2, & (9) \end{array}

where $Z_{h}$ is a normalization constant. V_h(x_{h, k}) is the variational energy given by

\begin{array}{l} V_{h} (x_{h, k}) = E_{q (X_{\ h, k})} [ln p (X_{k}, o_{k} ∣ ℙ, ϵ_{k})] . & (10) \end{array}

Here we introduced the notation X_{\h, k} for excluding x_{h, k} from the set X_k, Then under Brownian and Gaussian assumptions, the approximation variational posterior (Zhu et al., 2025) is

\begin{array}{l} x_{h, k} ∣ o_{k}, ℙ ~ N (μ_{h, k}, C_{h, k}), \\ h = 1, 2. & (11) \end{array}

Under this approximation, the inference of the posterior distributions of x_h is reduced to the estimation of the mean μ_{h, k} and the covariance matrix C_{h, k}, or equivalently the precision matrix $P_{h, k} \equiv {(C_{h, k})}^{- 1}$ . Following (Zhu et al., 2025), the update rules for the posterior distributions of x₁ and x₂ are derived.

At the bottom (zeroth) level of the hierarchical perceptual model, we can directly determine multivariate exponential distribution q(x_{0, k}) with the expectation:

\begin{array}{l} μ_{0, k} = o_{k} . & (12) \end{array}

At the first level, following Equation 10, V₁(x₁) is calculated as

\begin{array}{l} V_{1} (x_{1, k}) = E_{q (X_{\ 2, k})} [\ln p (X_{k}, o_{k} ∣ ℙ, ϵ_{k})] \\ = \ln p (o_{k} ∣ x_{0, k}) + E_{q (x_{0, k})} [\ln p (x_{0, k} ∣ x_{1, k})] \\ + E_{q (x_{2, k})} [\ln p (x_{1, k} ∣ x_{2, k}, W_{2}, b_{2}, ϵ_{k})] \\ \approx 1^{T} (W_{1} x_{1, k} + b_{1}) - μ_{0, k}^{T} exp (W_{1} x_{1, k} + b_{1}) \\ - \frac{1}{2} (x_{1, k} - μ_{1, k - 1}))^{T} (ϵ_{k} {\hat{Σ}}_{1, k} + C_{1, k - 1})^{- 1} (x_{1, k} - μ_{1, k - 1}) \\ + const & (13) \end{array}

where 1 is a d₀ dimensional column vector in which all elements are 1. Here we use the approximation

\begin{array}{l} {(ϵ_{k} Σ_{1, k} + C_{1, k - 1})}^{- 1} \approx {(ϵ_{k} {\hat{Σ}}_{1, k} + C_{1, k - 1})}^{- 1}, & (14) \end{array}

with ${\hat{Σ}}_{1, k}$ computed from the second level

\begin{array}{l} {\hat{Σ}}_{1, k} = {\hat{L}}_{1, k} {\hat{L}}_{1, k}^{T}, \\ {\hat{L}}_{1, k} = F_{2} (μ_{2, k - 1}; w_{2}, b_{2}) . & (15) \end{array}

The variational energy V₁(x_{1, k}) is not a standard Gaussian quadratic form, so we have to employ a Gaussian quadratic form to approximate it (Zhu et al., 2025). To obtain this approximation form, we give the gradient and Hessian matrix of V₁(x_{1, k}) as follows:

\begin{array}{l} \nabla V_{1} (x_{1, k}) = W_{1}^{T} [1 - μ_{0, k} ⊙ \exp (W_{1} x_{1, k} + b_{1})] \\ - (ϵ_{k} {\hat{Σ}}_{1, k} + C_{1, k - 1})^{- 1} (x_{1, k} - μ_{1, k - 1}), & (16) \end{array}

and

\begin{array}{l} \nabla^{2} V_{1} (x_{1, k}) = - W_{1}^{T} diag (μ_{0, k} ⊙ \exp (W_{1} x_{1, k} + b_{1})) W_{1} \\ - (ϵ_{k} {\hat{Σ}}_{1, k} + C_{1, k - 1})^{- 1}, & (17) \end{array}

Under the Gaussian quadratic form approximation, which is based on a single step Newton method (Zhu et al., 2025), the tendency of x_{0, k} is captured by

\begin{array}{l} μ_{1, k} = μ_{1, k - 1} + C_{1, k} W_{1}^{T} {P E}_{0, k}, & (18) \end{array}

where PE_{0, k} is the prediction error:

\begin{array}{l} {P E}_{0, k} = 1 - μ_{0, k} ⊙ {\hat{r}}_{0, k} . & (19) \end{array}

${\hat{r}}_{0, k} \equiv {[{\hat{r}}_{0, k}^{(1)}, {\hat{r}}_{0, k}^{(2)}, \dots, {\hat{r}}_{0, k}^{(d_{0})}]}^{T}$ is the prediction given by the mapping in Equation 2:

\begin{array}{l} {\hat{r}}_{0, k} = exp (W_{1} μ_{1, k - 1} + b_{1}) . & (20) \end{array}

Unpacking prediction error PE_{0, k} results in a meaningful formula,

\begin{array}{l} P E_{0, k}^{(i)} = 1 - μ_{0, k}^{(i)} {\hat{r}}_{0, k}^{(i)} = 1 - \frac{μ_{0, k}^{(i)}}{\frac{1}{{\hat{r}}_{0, k}^{(i)}}} . \end{array}

The inverse of the predicted rate $\frac{1}{{\hat{r}}_{0, k}^{(i)}}$ gives the expectation of sensory input, and the ratio $\frac{μ_{0, k}^{(i)}}{\frac{1}{{\hat{r}}_{0, k}^{(i)}}}$ measures the accuracy of the prediction. If the ratio is greater than 1 (i.e., the predicted expectation of sensory input is less than the actual sensory input), the prediction error is negative, and the agent should decrease $μ_{1}^{(i)}$ . If the ratio is less than 1, the prediction error is positive, the agent should increase $μ_{1}^{(i)}$ , so that the predicted expectation of sensory input could be decreased. Ideally, the ratio is equal to 1,and the prediction error vanishes, which means that the predicted expectation of the sensory input is equal to the actual sensory input.

In Equation 18, the prediction error is scaled and rotated by the covariance matrix C_{1, k} of the approximate Gaussian distribution, which is converted from the precision matrix:

\begin{array}{l} \begin{array}{l} C_{1, k} \equiv {(P_{1, k})}^{- 1}, \\ P_{1, k} = {\hat{Π}}_{1, k} + W_{1}^{T} diag (μ_{0, k} ⊙ {\hat{r}}_{0, k}) W_{1} . \end{array} & (21) \end{array}

Here prediction precision ${\hat{Π}}_{1, k}$ is given by

\begin{array}{l} {\hat{Π}}_{1, k} = {(ϵ_{k} {\hat{Σ}}_{1, k} + C_{1, k - 1})}^{- 1} . & (22) \end{array}

Note that the off-diagonal elements of the inverse prediction precision matrix ${\hat{Π}}_{1, k}$ give the prediction correlations.

At the second level, the volatility, consisting of the uncertainties and pairwise correlations in natural parameters, is inferred by similar variational approximation method (Zhu et al., 2025). The mean is updated by

\begin{array}{l} \begin{matrix} μ_{2, k} & = μ_{2, k - 1} + ϵ_{k} C_{2, k} W_{2}^{T} {\hat{L}}_{g 1, k} ({\hat{Ω}}_{1, k} \otimes I_{d_{1}}) vec (Δ_{1, k}^{T}) . \end{matrix} & (23) \end{array}

Here Δ_{1, k} is given by

\begin{array}{l} Δ_{1, k} = & [C_{1, k} + {P E}_{1, k} {P E}_{1, k}^{T}] {\hat{Π}}_{1, k} - I_{d_{1}} . & (24) \end{array}

The constant matrix I_d₁ is a d₁-by-d₁ unit square matrix. PE_{1, k} is the prediction error on the hidden state x₁

\begin{array}{l} {P E}_{1, k} = μ_{1, k} - μ_{1, k - 1} . & (25) \end{array}

${\hat{L}}_{g 1, k}$ is given by

\begin{array}{l} \begin{matrix} {\hat{L}}_{g 1, k} = [\begin{matrix} exp ({(W_{2}^{(1)})}^{T} μ_{2, k - 1} + b_{2}^{(1)}) e_{2}^{T} (1) \\ 2 cosh ({(W_{2}^{(2)})}^{T} μ_{2, k - 1} + b_{2}^{(2)}) e_{2}^{T} (2) \\ exp ({(W_{2}^{(3)})}^{T} μ_{2, k - 1} + b_{2}^{(3)}) e_{2}^{T} (3) \\ 2 cosh ({(W_{2}^{(4)})}^{T} μ_{2, k - 1} + b_{2}^{(4)}) e_{2}^{T} (4) \\ ⋮ \\ exp ({(W_{2}^{(d_{2})})}^{T} μ_{2, k - 1} + b_{2}^{(d_{2})}) e_{2}^{T} (d_{2}) \end{matrix}] \end{matrix}, & (26) \end{array}

where the constant vector e₂(d₂) is a $d_{1}^{2}$ -dimension column vector. The j-th component in $e_{2}^{T} (d_{2})$ is 1 if j = i or 0 if j≠i. The column vector $W_{2}^{(i)}$ is the i-th row in the coefficient matrix W₂. ${\hat{Ω}}_{1, k}$ is defined as

\begin{array}{l} {\hat{Ω}}_{1, k} = {\hat{L}}_{1, k}^{T} {\hat{Π}}_{1, k} . & (27) \end{array}

The precision matrix is updated by

\begin{array}{l} P_{2, k} = {\hat{Π}}_{2, k} + W_{2}^{T} {\hat{L}}_{g 1, k} {ϵ_{k}^{2} K_{d_{1} d_{1}} \\ [{\hat{Ω}}_{1, k}^{T} \otimes [{\hat{Ω}}_{1, k} Δ_{1, k}] + [Δ_{1, k}^{T} {\hat{Ω}}_{1, k}^{T}] \otimes {\hat{Ω}}_{1, k} + {\hat{Ω}}_{1, k}^{T} \otimes {\hat{Ω}}_{1, k}] \\ + ϵ_{k}^{2} [[{\hat{L}}_{1, k}^{T} Δ_{1, k}^{T} {\hat{Ω}}_{1, k}^{T}] \otimes {\hat{Π}}_{1, k} + [{\hat{L}}_{1, k}^{T} {\hat{Ω}}_{1, k}^{T}] \otimes [{\hat{Π}}_{1, k} Δ_{1, k}] \\ + [{\hat{L}}_{1, k}^{T} {\hat{Ω}}_{1, k}^{T}] \otimes {\hat{Π}}_{1, k}] - ϵ_{k} [I_{d_{1}} \otimes [{\hat{Π}}_{1, k} Δ_{1, k}]]} \\ {\hat{L}}_{g 1, k}^{T} W_{2} - W_{2}^{T} diag (lvec (δ_{1, k})) W_{2}, & (28) \end{array}

where

δ_{1, k} = ϵ_{k} [Δ_{1, k}^{T} {\hat{Ω}}_{1, k}^{T}] ⊙ {\hat{L}}_{1, k}

The precision matrix of the prediction ${\hat{Π}}_{2}$ is given by

\begin{array}{l} {\hat{Π}}_{2, k} = {(ϵ_{k} Σ_{2} + C_{2, k - 1})}^{- 1} . & (29) \end{array}

The notation K_mn denotes a mn-by-mn commutation matrix (Magnus and Neudecker, 1979).

5 Variational Bayesian learning

A model $M$ with a set of parameters ℙ receives and encodes sensory input o(t). We can arrange all elements of ℙ into a vector ξ. Here, we introduce the following mean field approximation to fit the parameters of the model with the sensory inputs o_1:K

\begin{array}{l} \begin{array}{l} q (ℙ) = q (ξ) = q (w_{1}) q (b_{1}) q (w_{2}) q (b_{2}) q (λ) \\ \cdot q (μ_{1, 0}) q (C_{1, 0}) q (μ_{2, 0}) q (C_{2, 0}) . \end{array} & (30) \end{array}

Then

\begin{array}{l} \begin{array}{l} ln p (o_{1 : K} | M) = ln \int p (o_{1 : K}, ξ | M) d ξ \\ = ln \int \frac{p (o_{1 : K}, ξ | M) q (ξ)}{q (ξ)} d ξ \\ \geq \int q (ξ) ln (\frac{p (o_{1 : K}, ξ | M)}{q (ξ)}) d ξ \\ = \int q (ξ) ln p (o_{1 : K}, ξ | M) - q (ξ) ln q (ξ) d ξ \\ ≜ F_{M} (ξ) \end{array} . & (31) \end{array}

We use the Lagrange multiplier method to work out the optimal variational posterior as follows:

\begin{array}{l} \begin{array}{l} q (ξ) = \frac{1}{Z_{ξ}} exp (V (ξ)) \\ V (ξ) = ln p (o_{1 : K}, ξ | M) . \end{array} & (32) \end{array}

Then we execute a Laplacian approximation to determine a Gaussian approximation of the variational posterior solution (Equation 33)

\begin{array}{l} \begin{array}{l} μ_{ξ} = \underset{ξ}{argmax} V (ξ) = \underset{ξ}{argmax} ln p (o_{1 : K}, ξ | M) \\ = \underset{ξ}{argmax} ln p (o_{1 : K} | ξ, M) p (ξ) \\ = \underset{ξ}{argmax} \sum_{k = 1}^{K} ln p (o_{k} | ξ, M) +ln p (ξ) \\ = \underset{ξ}{argmax} \sum_{k = 1}^{K} ln p (o_{k} | {\hat{r}}_{0, k}, ξ, M) +ln p (ξ), \\ C_{ξ} = - \frac{\partial^{2} V (μ_{ξ})}{\partial ξ \partial ξ^{T}}, \end{array} & (33) \end{array}

where $ln p (o_{k} | ξ, M)$ is the logarithm of the predictive distribution $o_{k} ~ E ({\hat{r}}_{0, k})$ and is given by

\begin{array}{l} ln p (o_{k} | {\hat{r}}_{0, k}, ξ, M) = 1^{T} ln {\hat{r}}_{0, k} - o_{k}^{T} {\hat{r}}_{0, k} . & (34) \end{array}

Finally, the maximum value $F_{M} (μ_{ξ}, C_{ξ})$ of the negative free energy $F_{M} (ξ)$ is given by

\begin{array}{l} \begin{matrix} F_{M} (ξ) \leq F_{M} (μ_{ξ}, C_{ξ}) = V (μ_{ξ}) + \frac{d_{ξ}}{2} ln 2 π e + \frac{1}{2} ln det (C_{ξ}) . \end{matrix} & (35) \end{array}

6 Simulation study

To verify the effectiveness of the proposed model, we conducted simulations on synthetic data to assess the model's ability to capture time-varying rate parameters of multivariate exponential distribution. The purpose of using simulation is to validate the model on precisely defined data, so that the results given by the model could be compared with ground truth.

6.1 An ablation model

To assess the ability of our hierarchical Bayesian model $M$ , we define an ablation model $M_{a}$ as a baseline model to evaluate the role of the top (volatility) level of the hierarchical Bayesian model $M$ . Put simply, an ablation model $M_{a}$ is the simple version of the hierarchical Bayesian model $M$ with a constant volatility x₂(t) = μ₂. In this case, we can remove the variable x_{2, k} and keep a constant likelihood matrix Σ₁. The model $M_{a}$ can be defined by Equations 1–3. Figure 2 shows the overall framework of the ablation model $M_{a}$ .

Figure 2

Diagram illustrating an Ablation Generative Model. It starts with a green circle labeled “Σ1,” followed by a function “x1(t) = 𝒬(t, Σ1).” An arrow leads to another function within brackets, “𝒪(x1(t); exp(W1x1,k + b1)),” ending with a gray circle labeled “o(t).” Arrows indicate the sequence of processes.

Figure 2. Overview of the ablation model.

The update equations for the ablation model are similar to Equations 12, 18–22 with ${\hat{Σ}}_{1} = Σ_{1}$ . Put simply, we assume that Σ₁ is a diagonal matrix with positive diagonal elements. Therefore, Σ₁ can be determined by a vector σ₁ with positive elements. The prior distribution of Σ₁ is defined by

\begin{array}{l} q (Σ_{1}) = q (ln σ_{1}) = N (ln σ_{1}; μ_{ln σ_{1}}, C_{ln σ_{1}}) & (36) \end{array}

where μ_{ln_σ₁}, C_{ln_σ₁} are the parameters of the prior distribution. Other parameters of this model are the same prior model with the above hierarchical Bayesian model (cf. Supplementary material Section 2).

6.2 Simulation setup

In detail, simulations were carried out in four steps as follows:

1. Generating synthetic sensory inputs. We randomly generated a sequence of bivariate exponential variable o_1:K = o(t₁), o(t₂), o(t₃), ⋯ , o(t_K) (K = 400) (Figure 3):

\begin{array}{l} p (o (t)) = E (o (t), r_{0} (t)), & (37) \end{array}

Figure 3

Two graphs labeled a and b, each with blue scatter plots and red sinusoidal lines. The x-axis displays trial numbers from zero to four hundred. The y-axis on the left shows variable values in blue, while the y-axis on the right indicates rates in red, both ranging from zero to ten.

Figure 3. Time-varying rate parameter and sensory inputs of volatile multivariate exponentially distributed signals. Panels (A, B) represent two dimensions of input signal. In each panel, blue dots are the sensory inputs o⁽ⁱ⁾ of the i-th dimension of the signal. Red lines represent the expected rate $r_{0}^{(i)} (t)$ . The black dashed lines are the expectation of the sensory input o⁽ⁱ⁾, i.e., the inverse of the expected rate $r_{0}^{(i)} (t)$ . Note that the expected rate in the two dimensions fluctuates in time, synchronously before and anti-synchronously after trial 200.

where the time-varying rate vector r₀(t) was governed by cosine waves and was defined by

\begin{array}{l} \begin{array}{l} r_{0}^{(1)} (t_{k}) = 2.5 + 2 cos (\frac{7 π}{K} t_{k}), \\ r_{0}^{(2)} (t_{k}) = {\begin{array}{l} 2.5 + 2 cos (\frac{7 π}{K} t_{k}) k \leq 200 \\ 2.5 - 2 cos (\frac{7 π}{K} t_{k}) k \geq 201 \end{array} . \end{array} \end{array}

2. Initializing the sufficient statistics of all random parameters. We must choose particular initial sufficient statistics of a parameter vector ξ (Table 1 for the hierarchical Bayesian model and Table 2 for the ablation model) to make the models work well on a sequence of sensory inputs. Then we determined the prior distribution of ξ. All parameter configurations for the two models (Figures 1, 2) are shown in Tables 1, 2.

3. Maximizing negative free energy. We employed optimization methods to obtain the optimal sufficient statistics (μ_ξ, C_ξ) of the prior parameter ξ. The quasi-Newton Broyden-Fletcher-Goldfarb-Shanno method based on a line search framework (Nocedal and Wright, 2006) was adopted to maximize negative free energy (Equations 31, 33, 34) (Beal, 2003; Friston, 2010).

4. Generating the optimal trajectories of all states. We use the optimal prior parameters μ_ξ to characterize a particular model (Figures 1, 2). The two models are compared on inference and decision-making tasks.

Table 1

Table 1. Parameters of the hierarchical Bayesian model.

Table 2

Table 2. Parameters of the ablation model.

6.3 Perceiving volatile multivariate exponentially distributed signals

The proposed hierarchical Bayesian inference model endowed with the optimal parameter μ_ξ constitutes a hierarchical Bayesian agent. We asked the hierarchical Bayesian agent to perceive volatile multivariate exponentially distributed signals as shown in Figure 3.

The dynamic tendency μ₁(t) of the log-rate vector x₁(t) is tracked online by the hierarchical Bayesian agent (Figure 4). μ₁ follows the varying trend of the expected rate in logarithmic space. The uncertainty of μ₁(t) is stable (light-red shaded area in Figure 4). The prediction error PE₁ fluctuates around a baseline (blue line in Figure 4).

Figure 4

Two line graphs labeled a and b display data over time. The x-axis represents time from 0 to 360. The y-axes have red lines (μ) ranging from negative 10 to 30, and blue lines (PE) from negative 8 to 2. The graphs show fluctuating trends with red confidence intervals highlighting variability.

Figure 4. Temporal dynamics of the tendency μ₁ of the log-rate vector x₁(t) at the first level. Panels (A, B) represent two dimensions of the expectation μ₁. In details, each panel shows one component of μ₁ in red, and PE₁ in blue. The light-red shaded area represents the uncertainty of each component (i.e., $μ_{1}^{(i)} (t) \pm \sqrt{C_{1}^{(i, i)} (t)}, i \in {1, 2}$ ). The red markers △, ° represent the priors on the standard deviation and the mean of each component respectively.

Overall, the agent perceives the expected rate vector well (Figure 5). For a majority of the trials, both of the belief expectations $μ_{0}^{(1)}, μ_{0}^{(2)}$ (solid lines in Figures 5A, C) fluctuates around the expected rate (dashed lines in Figures 5A, C). In the initial stage, the agent quickly adjusts itself to adapt to the input signal and tracks the expected states. Due to the stochasticity, the sample rate intensity in sensory inputs deviates from the expected rate intensity, leading to the estimated belief rate intensity $μ_{0}^{(i)}, i = 0, 1$ to deviate from the expected rate intensity. From trial 120 to trial 165, the sample rate intensity in sensory inputs o⁽¹⁾ is larger than the expected rate intensity in Figure 3A. The agent's belief is higher than the expected rate (Figure 5A). From trial 116 to trial 158 (trial 296 to trial 308), the sample rate intensity in sensory inputs o⁽²⁾ is greater than the expected rate intensity in Figure 3B, leading the agent to have higher belief of the rate intensity than the expected rate value.

Figure 5

Four graphs display data across 400 trials. Graph (a) shows blue and red lines with overlapping peaks and dips, representing two datasets. Graph (b) features red scattered points trending downward. Graph (c) is similar to (a), showing different data. Graph (d) mirrors (b) with red points, but shows less variation. All graphs relate to trial numbers on the x-axis.

Figure 5. Temporal dynamics of the expectation of the logarithm of volatility μ₂ in the state x₁ at the second level. Panels (A–C) represent three dimensions of the expectation μ₂. Each panel shows the evolution of one element of μ₂ in red and the corresponding element of PE₂ in blue. Light-red shaded area represents the uncertainty of each dimension (i.e., $μ_{2}^{(i)} (t) \pm \sqrt{C_{2}^{(i, i)} (t)}, i \in {1, 2, 3}$ ). The red markers △, ° represent the priors of the standard deviation and mean of each dimension.

The expectations of log-volatilities in the logarithms of the rate vector ( $μ_{2}^{(1)}$ and $μ_{2}^{(2)}$ , i.e., internal representation of the expected states) has notable changes, stabilized for most of the time (Figure 6). From trial 1 to trial 200, changes in rate $r_{0}^{(1)}$ are consistent with changes in rate $r_{0}^{(2)}$ (Figure 3). In theory, they are positively correlated during this period. From trial 1 to trial 186, the prediction correlation ${\hat{ρ}}_{1}$ continues to increase (Figure 7). From trial 187 to trial 200, asynchronous local fluctuations (or noise) lead to a decrease in prediction correlation ${\hat{ρ}}_{1}$ . From trial 201 to trial 400, changes in rate $r_{0}^{(1)}$ are the opposite with the changes in rate $r_{0}^{(2)}$ (Figure 3). The two dimensions of the signal are negatively correlated during this period. As a result, the prediction correlation ${\hat{ρ}}_{1}$ of the agent continues to decrease from trial 201 to trial 359. From trial 359 to trial 365, prediction errors $P E_{1}^{(1)} and P E_{1}^{(2)}$ are positive numbers, and drive prediction correlation ${\hat{ρ}}_{1}$ to jump to a larger value (Figure 7). The hierarchical Bayesian agent therefore is able to uncover the correlation structures of the signal dynamically.

Figure 6

Three line graphs labeled a, b, and c depict data over time. Red lines show $\mu_{2}^{(i)}$ within a shaded region, which decreases slightly from left to right. Blue lines indicate $PE^{(i)}$, with spikes at various intervals. The x-axis represents time from 0 to 375. The y-axes are labeled with corresponding values for each variable.

Figure 6. Temporal dynamics of the expectation of the logarithm of volatility μ₂ in the state x₁ at the second level. Each panel shows the evolution of one element of μ₂ in red and the corresponding element of PE₂ in blue. Light-red shaded area represents the uncertainty of each dimension (i.e., $μ_{2}^{(i)} (t) \pm \sqrt{C_{2}^{(i, i)} (t)}, i \in {1, 2, 3}$ ). The red markers △, ° represent the priors of the standard deviation and mean of each dimension.

Figure 7

Line graph showing the relationship between trial number and $ \hat{\rho}_1 $. The graph ranges from 0 to 400 on the x-axis (trial number) and from 0 to 0.15 on the y-axis ($ \hat{\rho}_1 $). The line, in red, peaks around trial 200.

Figure 7. Prediction correlation ${\hat{ρ}}_{1} (t)$ is extracted from the inverse prediction precision ${\hat{Π}}_{1} (t)$ generated by the second (log-volatility) level.

6.4 Bayesian model selection

To compare the performance of the proposed hierarchical Bayesian model $M$ and the ablation model $M_{a}$ , we performed 100 independent simulations for each model using different seeds of random number generators. Based on these simulations, Bayesian factors were calculated. Figure 8 shows the histogram of the Bayesian factors $B F (M, M_{a})$ . According to the criteria suggested by Harold Jeffreys (cf. Supplementary material Section 4), $M$ is better than $M_{a}$ .

Figure 8

Bar graph showing probabilities for different Bayes factor ranges. The y-axis is labeled “Prob.” ranging from 0 to 1. The x-axis shows categories: BF<0.01, 0.01≤BF<0.1, 0.1≤BF<0.333, 0.333≤BF<1, 1≤BF<3, 3≤BF<10, 10≤BF<100, BF>100. Only the BF>100 category has a bar with a height of 1.

Figure 8. Histogram of Bayesian factors. Bayesian factor with the Bayesian information criterion $B F (M, M_{a})$ .

7 Discussion

7.1 Contributions of this study

In this article, we developed a hierarchical Bayesian model to infer and track online the tendency and volatility in multivariate exponential signals. The bottom level of the hierarchical Bayesian model is to learn the expected rate parameter vector of the multivariate exponential signal. The logarithm of the rate parameter vector x₁ is modeled to evolve as a general Brownian motion at the first level. Under the Brownian and Gaussian assumption on x₁, the volatility in x₁ can be computed by the Cholesky decomposition of the diffusion matrix of the Brownian motion x₁. Therefore, we introduce a parameterization of the volatility in x₁ in logarithmic space after the Cholesky decomposition of the diffusion matrix of x₁. The volatility in x₁ can be represented by x₂, which again evolves as a Brownian motion. The low-order interactions among the components of the log-rate parameter vector and uncertainties are captured by x₂ at the second level of the model.

The hierarchical Bayesian model assumes that the log-rate parameter vector x₁(t) evolves as a general Brownian motion and can be updated by Equation 18, where prediction error PE_{0, k} drives the agent to diminish the difference between the agent's belief and the sensory input. The coefficient matrix W₁ plays the role of scaling factors to weight prediction error PE_{0, k}. The covariance C_{1, k} functions as complex adaptive learning rate in Equation 21.

In principle, the proposed model could be easily generalized to a Bayesian framework for decision making in high-dimensional volatile environments by defining appropriate form of response models (Berger, 2013; Mathys et al., 2014; Zhu et al., 2022). In this article, we define a simple random response model based on bivariate exponential distribution. For other problems of interest, it is sufficient to construct a compatible response model addressing the particular optimization criteria of the question.

7.2 Limitations and strengths

The peakless and memoryless properties of the exponential distribution bring difficulties for an online agent to predict, since historical sensory inputs can only provide weak evidence for a prediction. The proposed hierarchical Bayesian agent internally integrates historical sensory inputs and the current sensory input to infer the changes in the signal. The agent estimates the dynamic volatility in the sensory inputs and adjusts the learning rate based on the evidence of the volatility, so that the information from the signal is integrated into the internal states efficiently. The proposed hierarchical Bayesian agent is able to efficiently and accurately capture the characteristics of volatile multivariate exponentially distributed signals.

In the simulation, we observed that the proposed hierarchical Bayesian agent has good suppression effect on small volatility, but it is also swayed by the local variation of the rate intensity caused by the stochasticity of the signal. The prediction correlation is not only determined by changes in the trend of the sensory inputs but is also affected by volatility. Large local fluctuations can also cause jumps in prediction correlations. Asynchronous persistent small local fluctuations will also reduce the prediction correlation, while synchronous persistent small fluctuations will increase the prediction correlation.

In this study, we simply considered simulated data, which aims to capture dynamic and multidimensional aspects of nonstationary multivariate exponential signals and cannot cover other important features observed in real data set. The results obtained from simulations pave ways for further investigations of many estimation problems in neuroscience research. The possible applications of the method include firing rate estimation, functional brain connection estimation, etc.

8 Conclusions

We have introduced the mathematical basis of a hierarchical Bayesian model for inferring and tracking rate intensity parameter of multivariate exponential signals and illustrated its functionality. A family of interpretable closed form update rules were derived. In particular, we provided a full theoretical scenario that consists of inference in the perceptual model and learning optimal hyper-parameters by inversion of the hierarchical Bayesian model. The proposed theoretical framework was validated on synthetic data, and it turned out that the hierarchical Bayesian model worked well in tracking volatile multi-variate exponential signals. The preliminary study here points to the practical utility of our approach in analyzing high-dimensional neural activities, which often follow as distributions in exponential family.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

CZ: Conceptualization, Investigation, Software, Writing – original draft, Writing – review & editing, Methodology. KZ: Writing – original draft, Writing – review & editing. FT: Writing – original draft, Writing – review & editing. YT: Writing – original draft, Writing – review & editing. XL: Methodology, Writing – original draft, Writing – review & editing. BS: Conceptualization, Funding acquisition, Writing – original draft, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This work was supported by STI2030-Major Projects 2022ZD0205005.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.

Generative AI statement

The author(s) declare that no Gen AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fncom.2025.1408836/full#supplementary-material

Abbreviations

GHBF, general hierarchical Brownian filter; STD, standard deviation; iid, independently identically distribution; SI, sampling interval; PDF, probability density function.

References

Beal, M. J. (2003). Variational Algorithms for Approximate Bayesian Inference [PhD thesis]. University College London (UCL), London.

Google Scholar

Berger, J. O. (2013). Statistical Decision Theory and Bayesian Analysis. Cham: Springer Science & Business Media.

Google Scholar

Conrad, K. (2004). Probability distributions and maximum entropy. Entropy 6:10. Available online at: https://kconrad.math.uconn.edu/blurbs/analysis/entropypost.pdf

Google Scholar

Czirók, A., Schlett, K., Madarász, E., and Vicsek, T. (1998). Exponential distribution of locomotion activity in cell cultures. Phys. Rev. Lett. 81, 3038–3041. doi: 10.1103/PhysRevLett.81.3038

Crossref Full Text | Google Scholar

Daunizeau, J., Den Ouden, H. E., Pessiglione, M., Kiebel, S. J., Friston, K. J., and Stephan, K. E. (2010a). Observing the observer (II): deciding when to decide. PLoS ONE 5:e15555. doi: 10.1371/journal.pone.0015555

PubMed Abstract | Crossref Full Text | Google Scholar

Daunizeau, J., den Ouden, H. E. M., Pessiglione, M., Kiebel, S. J., Stephan, K. E., and Friston, K. J. (2010b). Observing the observer (I): meta-Bayesian Models of learning and decision-making. PLoS ONE 5:e15554. doi: 10.1371/journal.pone.0015554

PubMed Abstract | Crossref Full Text | Google Scholar

Davis, D. J. (1952). An analysis of some failure data. J. Am. Stat. Assoc. 47, 113–150. doi: 10.1080/01621459.1952.10501160

Crossref Full Text | Google Scholar

Drăgulescu, A., and Yakovenko, V. M. (2001). Evidence for the exponential distribution of income in the USA. Eur. Phys. J. B-Condens. Matter Complex Syst. 20, 585–589. doi: 10.1007/PL00011112

Crossref Full Text | Google Scholar

Epstein, B., and Sobel, M. (1953). Life testing. J. Am. Stat. Assoc. 48, 486–502. doi: 10.1080/01621459.1953.10483488

Crossref Full Text | Google Scholar

Esary, J. D., and Marshall, A. W. (1974). Multivariate distributions with exponential minimums. Ann. Stat. 2, 84–98. doi: 10.1214/aos/1176342615

Crossref Full Text | Google Scholar

Friston, K. (2010). The free-energy principle: a unified brain theory? Nat. Rev. Neurosci. 11, 127–138. doi: 10.1038/nrn2787

PubMed Abstract | Crossref Full Text | Google Scholar

Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., and Pezzulo, G. (2017). Active inference: a process theory. Neural Comput. 29, 1–49. doi: 10.1162/NECO_a_00912

PubMed Abstract | Crossref Full Text | Google Scholar

Haynes, J.-D., and Rees, G. (2006). Decoding mental states from brain activity in humans. Nat. Rev. Neurosci. 7, 523–534. doi: 10.1038/nrn1931

PubMed Abstract | Crossref Full Text | Google Scholar

Ibe, O. (2014). Fundamentals of Applied Probability and Random Processes. Cambridge, MA: Academic Press. doi: 10.1016/B978-0-12-800852-2.00012-2

Crossref Full Text | Google Scholar

Jaynes, E. T. (1982). On the rationale of maximum-entropy methods. Proc. IEEE 70, 939–952. doi: 10.1109/PROC.1982.12425

Crossref Full Text | Google Scholar

Johnson, R. A.Wichern, D. W., et al. (2002). Applied Multivariate Statistical Analysis. Upper Saddle River, NJ: Prentice Hall.

Google Scholar

Jung, J. H., and O'Leary, D. P. (2006). Cholesky Decomposition and Linear Programming on a GPU. Scholarly Paper. College Park, MD: University of Maryland.

Google Scholar

Kingman, J. F. C. (1992). Poisson Processes, Volume 3. Oxford: Clarendon Press. doi: 10.1093/oso/9780198536932.001.0001

Crossref Full Text | Google Scholar

Kundu, D., and Gupta, R. D. (2009). Bivariate generalized exponential distribution. J. Multivar. Anal. 100, 581–593. doi: 10.1016/j.jmva.2008.06.012

Crossref Full Text | Google Scholar

Li, S., and Le, W. (2017). Milestones of parkinson's disease research: 200 years of history and beyond. Neurosci. Bull. 33, 598–602. doi: 10.1007/s12264-017-0178-2

PubMed Abstract | Crossref Full Text | Google Scholar

Li, Z., O'Doherty, J. E., Hanson, T. L., Lebedev, M. A., Henriquez, C. S., Nicolelis, M. A., et al. (2009). Unscented kalman filter for brain-machine interfaces. PLoS ONE 4:e6243. doi: 10.1371/journal.pone.0006243

PubMed Abstract | Crossref Full Text | Google Scholar

Lo, C.-C., Chou, T., Penzel, T., Scammell, T. E., Strecker, R. E., Stanley, H. E., et al. (2004). Common scale-invariant patterns of sleep-wake transitions across mammalian species. Proc. Nat. Acad. Sci. 101, 17545–17548. doi: 10.1073/pnas.0408242101

PubMed Abstract | Crossref Full Text | Google Scholar

Magnus, J. R., and Neudecker, H. (1979). The commutation matrix: some properties and applications. Ann. Stat. 7, 381–394. doi: 10.1214/aos/1176344621

Crossref Full Text | Google Scholar

Maimon, G., and Assad, J. A. (2009). Beyond poisson: Increased spike-time regularity across primate parietal cortex. Neuron 62, 426–440. doi: 10.1016/j.neuron.2009.03.021

PubMed Abstract | Crossref Full Text | Google Scholar

Malik, W. Q., Truccolo, W., Brown, E. N., and Hochberg, L. R. (2010). Efficient decoding with steady-state kalman filter in neural interface systems. IEEE Trans. Neural Syst. Rehabil. Eng. 19, 25–34. doi: 10.1109/TNSRE.2010.2092443

PubMed Abstract | Crossref Full Text | Google Scholar

Marshall, A. W., and Olkin, I. (1967a). A generalized bivariate exponential distribution. J. Appl. Probab. 4, 291–302. doi: 10.2307/3212024

Crossref Full Text | Google Scholar

Marshall, A. W., and Olkin, I. (1967b). A multivariate exponential distribution. J. Am. Stat. Assoc. 62, 30–44. doi: 10.1080/01621459.1967.10482885

Crossref Full Text | Google Scholar

Mathys, C. D., Daunizeau, J., Friston, K. J., and Stephan, K. E. (2011). A Bayesian foundation for individual learning under uncertainty. Front. Hum. Neurosci. 5:39. doi: 10.3389/fnhum.2011.00039

PubMed Abstract | Crossref Full Text | Google Scholar

Mathys, C. D., Lomakina, E. I., Daunizeau, J., Iglesias, S., Brodersen, K. H., Friston, K. J., et al. (2014). Uncertainty in perception and the hierarchical Gaussian filter. Front. Hum. Neurosci. 8:825. doi: 10.3389/fnhum.2014.00825

PubMed Abstract | Crossref Full Text | Google Scholar

McCaslin, J. O., and Broussard, P. R. (2007). Search for chaotic behavior in a flapping flag. arXiv. Available online at: https://arxiv.org/abs/0704.0484

Google Scholar

Nazmi, N., Mazlan, S. A., Zamzuri, H., and Rahman, M. A. A. (2015). Fitting distribution for electromyography and electroencephalography signals based on goodness-of-fit tests. Procedia Comput. Sci. 76, 468–473. doi: 10.1016/j.procs.2015.12.317

Crossref Full Text | Google Scholar

Nocedal, J., and Wright, S. J. (2006). Numerical Optimization, 2nd Edn. New York, NY: Springer.

Google Scholar

Ouyang, G., Wang, S., Liu, M., Zhang, M., and Zhou, C. (2023). Multilevel and multifaceted brain response features in spiking, erp and erd: experimental observation and simultaneous generation in a neuronal network model with excitation-inhibition balance. Cogn. Neurodyn. 17, 1417–1431. doi: 10.1007/s11571-022-09889-w

PubMed Abstract | Crossref Full Text | Google Scholar

Pan, H., Fu, Y., Zhang, Q., Zhang, J., and Qin, X. (2022). The decoder design and performance comparative analysis for closed-loop brain-machine interface system. Cogn. Neurodyn. 18, 147–164. doi: 10.1007/s11571-022-09919-7

PubMed Abstract | Crossref Full Text | Google Scholar

Qi, Y., Liu, B., Wang, Y., and Pan, G. (2019). “Dynamic ensemble modeling approach to nonstationary neural decoding in brain-computer interfaces,” in Advances in Neural Information Processing Systems, Vol. 32, eds. H. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alché-Buc, E. Fox, and R. Garnett (Curran Associates, Inc.). Available online at: https://proceedings.neurips.cc/paper_files/paper/2019/file/3f7bcd0b3ea822683bba8fc530f151bd-Paper.pdf

Google Scholar

Roxin, A., Brunel, N., Hansel, D., Mongillo, G., and Vreeswijk, C. V. (2011). On the distribution of firing rates in networks of cortical neurons. J. Neurosci. 31, 16217–16226. doi: 10.1523/JNEUROSCI.1677-11.2011

PubMed Abstract | Crossref Full Text | Google Scholar

Shanker, R., Hagos, F., and Sujatha, S. (2015). On modeling of lifetimes data using exponential and Lindley distributions. Biom. Biostat. Int. J. 2, 1–9. doi: 10.15406/bbij.2015.02.00042

Crossref Full Text | Google Scholar

Stein, R. R., Marks, D. S., and Sander, C. (2015). Inferring pairwise interactions from biological data using maximum-entropy probability models. PLoS Comput. Biol. 11:e1004182. doi: 10.1371/journal.pcbi.1004182

PubMed Abstract | Crossref Full Text | Google Scholar

Swindale, N. V., Rowat, P., Krause, M., Spacek, M. A., and Mitelut, C. (2021). Voltage distributions in extracellular brain recordings. J. Neurophysiol. 125, 1408–1424. doi: 10.1152/jn.00633.2020

PubMed Abstract | Crossref Full Text | Google Scholar

Tanabe, K., and Sagae, M. (1992). An exact cholesky decomposition and the generalized inverse of the variance-covariance matrix of the multinomial distribution, with applications. J. R. Stat. Soc. B (Methodological) 54, 211–219. doi: 10.1111/j.2517-6161.1992.tb01875.x

Crossref Full Text | Google Scholar

Varde, S. D. (1969). Life testing and reliability estimation for the two parameter exponential distribution. J. Am. Stat. Assoc. 64, 621–631. doi: 10.1080/01621459.1969.10501000

Crossref Full Text | Google Scholar

Xu, L., Xu, M., Jung, T.-P., and Ming, D. (2021). Review of brain encoding and decoding mechanisms for EEG-based brain-computer interface. Cogn. Neurodyn. 15, 569–584. doi: 10.1007/s11571-021-09676-z

PubMed Abstract | Crossref Full Text | Google Scholar

Yousefi, A., Basu, I., Paulk, A. C., Peled, N., Eskandar, E. N., Dougherty, D. D., et al. (2019). Decoding hidden cognitive states from behavior and physiology using a Bayesian approach. Neural Comput. 31, 1751–1788. doi: 10.1162/neco_a_01196

PubMed Abstract | Crossref Full Text | Google Scholar

Zhang, Y.-J., Yu, Z.-F., Liu, J. K., and Huang, T.-J. (2022). Neural decoding of visual information across different neural recording modalities and approaches. Mach. Intell. Res. 19, 350–365. doi: 10.1007/s11633-022-1335-2

Crossref Full Text | Google Scholar

Zhu, C., Zhou, K., Han, Z., Tang, Y., Tang, F., Si, B., et al. (2025). General Hierarchical Brownian Filter in Multi-dimensional Volatile Environments. submitted.

Google Scholar

Zhu, C., Zhou, K., Tang, F., Tang, Y., Li, X., Si, B., et al. (2022). A hierarchical Bayesian model for inferring and decision making in multi-dimensional volatile binary environments. Mathematics 10:4775. doi: 10.3390/math10244775

Crossref Full Text | Google Scholar

Keywords: online Bayesian learning, hierarchical filter, Brownian motion, exponential distribution, adaptive observation

Citation: Zhu C, Zhou K, Tang F, Tang Y, Li X and Si B (2025) A hierarchical Bayesian inference model for volatile multivariate exponentially distributed signals. Front. Comput. Neurosci. 19:1408836. doi: 10.3389/fncom.2025.1408836

Received: 23 April 2024; Accepted: 14 October 2025;
Published: 12 November 2025.

Edited by:

Kechen Zhang, Johns Hopkins University, United States

Reviewed by:

Guozhang Chen, Graz University of Technology, Austria
Dodi Devianto, Andalas University, Indonesia

Copyright © 2025 Zhu, Zhou, Tang, Tang, Li and Si. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Bailu Si, YmFpbHVzaUBibnUuZWR1LmNu

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.