Geometric properties of noninformative priors based on the chi-square divergence

Recently, a noninformative prior distribution that is different from the Jeffreys prior was derived as an extension of Bernardo's reference prior based on the chi-square divergence. We summarize this result in terms of information geometry and clarify some geometric properties. Specifically, we show that it corresponds to a parallel volume element and can be written as a power of the Jeffreys prior in flat model manifolds.


. Introduction
The problem of noninformative prior in Bayesian statistics is to determine what kind of probability distribution (often called a noninformative prior or an objective prior) is desirable on a statistical model in the absence of information about the parameters. In theory, though not in practice, it is essentially a problem of small-sample statistics, which has been under consideration for a long time [1][2][3][4].
Theoretical research on noninformative priors dates back to Jeffreys [3], and currently, a noninformative prior proposed by him, called the Jeffreys prior, is the standard noninformative prior. Theoretical justification of the Jeffreys prior comes from the theory of reference priors, which were originally proposed by Bernardo [5] decades ago when considering the maximization of the mutual information between the parameter and the outcome. Many related studies in this direction have since been reported [for review, see, e.g., Berger et al. [6]].
On the contrary, there are several criteria for considering noninformative priors. For example, Komaki [7,8] has proposed objective priors to improve the performance of Bayesian predictive densities. Some significant results were presented by his coworkers, including the author [e.g., noninformative priors on time series models have been proposed [9,10]]. From the viewpoint of information geometry, Takeuchi and Amari [11] proposed an α-parallel prior. For a recent review of other noninformative priors, see, e.g., Ghosh [12].
Recently, considering a certain extension of Bernardo's reference prior, Liu et al. [13] showed that a prior distribution different from the Jeffreys prior can be derived. Since it is based on the chi-square divergence, we call it χ 2 -prior for convenience. Apart from the Jeffreys prior, the geometric properties of χ 2 -prior are yet to be discussed.
In the present study, we investigate the derivation by Lie et al. of χ 2 -prior from the viewpoint of information geometry. We put emphasis on the invariance of the theory under reparametrization (coordinate transformation in differential geometry). While we follow their derivation, we rewrite the asymptotic expansion in geometric terms, which makes the problem easier to understand. We also derive the tensor equations that χ 2 -prior and an 1 2 -parallel prior.
Basic definitions and notation are given in Section 2. We also review some noninformative priors in terms of information geometry. In Section 3, we rewrite the asymptotic expansion by Liu et al. [13] in geometric terms to simplify their argument. In Section 4, we briefly review α-parallel priors, clarify a relation between χ 2prior and α-parallel prior, and derive a formula of an α-parallel prior in γ -flat models. Finally, concluding remarks are given in Section 5.

. Preliminaries
We briefly review some definitions and notation of information geometry [for details, refer to textbooks on information geometry [14,15]]. We also review some noninformative priors in terms of information geometry.
For a given statistical model, we would like to consider noninformative prior distributions defined in a manner independent of parametrization. For this reason, it is convenient to introduce differential geometrical quantities into our discussion, i.e., to consider them from the viewpoint of information geometry.

. . Basic definitions of information geometry
Suppose that a statistical model M = {p(x; θ ) : θ ∈ ⊂ R p } is given, which is regarded as a p-dimensional differential manifold and called a statistical model manifold (though it will be called simply a model where no confusion is possible). As usual, all necessary regularity conditions are assumed.
We also define the Riemannian metric and affine connections on the manifold M. Let l = log p(x; θ ) denote the loglikelihood function. Definition 1. The Riemannian metric g ij = g(∂ i , ∂ j ) is defined as and E[·] denotes expectation with respect to observation x. The above quantities are also called the Fisher information matrix in statistics. Thus, we often call the above metric the Fisher metric.
The statistical cubic tensor and the coefficients of the econnection are defined as Definition 2. For every real α, p 3 quantities define an affine connection, which is called the α-connection.
We identify an affine connection with its coefficients below. Connection coefficients with upper indices are obtained by where g ij is the inverse matrix of the Fisher metric g ij , and we have used Einstein's summation convention [see, e.g., Amari and Nagaoka [14] for details]. Conventionally, when α = 1, we call it the e-connection and when α = −1, we call it the m-connection and denote it as It is well-known that α-connection and −α-connection are mutually dual with respect to the Fisher metric. (In a Riemannian manifold with an affine connection Ŵ, another affine connection Ŵ * is said to be dual with respect to Ŵ if it satisfies ∂ k g ij = Ŵ ki,j + Ŵ * kj,i . For equivalent definitions, see, e.g., Amari and Nagaoka [14], Chap. 3.) When α = 0, the self-dual connection is called the Levi-Civita connection, which defines a parallel transport that keeps the Riemannian metric invariant. The Levi-Civita connection is defined by the sum of the partial derivative of the metric, and its explicit form is given by .

. Useful identities for alpha-connections
In the present study, the following identities are useful. They are obtained in a straightforward manner; thus, their proofs are omitted. hold.
The first equation yields relation (Equation 1). The last equation shows the duality of e-and m-connections directly and is generalized to ±α-connections.
Lemma 2. For mutually dual connections, the following identities hold.

Using Lemma 1 and Equation
Lemma 3. For m ijk , T ijk , and, the first derivative of Fisher metric, g ij , the following holds. .

. Prior distributions and volume elements
In Bayesian statistics, for a given statistical model M, we need a probability distribution over the model parameter space, which is called a prior distribution, or simply a prior. We often denote a prior density as π (π (θ ) ≥ 0 and π (θ ) θ = 1). A volume element on a p-dimensional model manifold corresponds to a prior density function over the parameter space (θ ∈ ⊂ R p ) in a one-to-one manner. For a prior π (θ ), its corresponding volume element ω is a p-form (differential form of degree p) and is written as ω = π (θ )dθ 1 ∧ · · · ∧ dθ p in the local coordinate system.
For example, in two-dimensional Euclidian space (p = 2), the volume element is given by ω = dx ∧ dy in Cartesian coordinates (x, y). In polar coordinates (r, θ ), it is written as ω = rdr ∧ dθ .
Then, under the coordinate transformation θ → ξ , how do the probability density on the parameter space and its ratio change? From the formula for the p-dimensional volume element, it is written as where ∂θ ∂ξ denotes the Jacobian. In differential geometry, such quantities are called tensor densities.

. . Noninformative priors defined by equations
We briefly summarize some of the prior studies on noninformative priors in Bayesian statistics. Basically, a noninformative prior is often defined as the solution of a partial differential equation (PDE) derived from fundamental principles. If it is independent of parametrization, then it usually has a geometrical meaning. The defining equation itself is expected to be invariant under every coordinate transformation.
For simplicity, we assume that the manifold admits global coordinates , and each point is specified by θ . We fix some nonnegative integers r and s. Suppose that a set of p r+s functions of the parameter θ ..a r (θ ), a 1 , . . . , a r ; b 1 , . . . , b s = 1, . . . , p is given, and these functions also have a representation in a different coordinate system, say ξ . Suppose they satisfy the following equation: where β b = ∂ξ β ∂θ b denotes the Jacobi matrix and˜ a α = ∂θ a ∂ξ α denotes the inverse. Then these functions are called a type (s, r) tensor field, or simply a tensor.
Some specific types have established names. For example, a type (0, 0) tensor is called a scalar (field) and a type (1, 0) tensor is called a vector (field). In particular, the ratio of two prior densities is a scalar. For a differential one-form, which is written as A = A j dθ j , the set of components A j is regarded as a contravariant vector [type (0, 1) tensor].
For a type (s, r) tensor A, which often includes a derivative, we refer to an equation like A = 0 as a tensor equation. Usually, such a tensor A is derived using some differential operators, and the component-wise form yields a PDE. The component-wise form is given as By definition, tensor equations are invariant under coordinate transformation (reparametrization). When we show that ..a r (θ ) = 0 for one coordinate system, say θ , then, for another coordinate system, say ξ , due to multilinearity, holds. Tensor equations are often written in the form A = B.

. . . Noninformative priors
Now let us explain about noninformative priors [see, e.g., Robert [4] for more details]. As mentioned before, we need to set a prior distribution over the parameter space for a given statistical model in Bayesian statistics. If we have certain information on the parameter in advance, then the prior should reflect this, and such a prior is often called a subjective prior. If not, we adopt a certain criterion and use a prior obtained through the criterion. Such priors are called noninformative priors.
The definition of a noninformative prior, which is often written as a PDE, should not depend on a specific parametrization (a coordinate system of the model manifold). If we claim to have no information on the parameter, then we do not determine which parametrization is natural. Based on this viewpoint, we take several examples of noninformative priors defined through a PDE with a certain criterion. Some equations defining a noninformative prior are not tensor equations and their solutions, that is, noninformative priors, do not satisfy the equation in another coordinate system. Tanaka   . /fams. .

. . . Uniform prior
The uniform prior π U (θ ) over the parameter space would be the most naive noninformative prior. This idea dates back to Laplace and has been criticized [3]. The uniform prior is given by a solution of the following PDE: Clearly, the above PDE (Equation 6) is not a tensor equation. In other words, it is not invariant under reparametrization. While the solution for the original parameter θ is constant, π U (θ ) ∝ 1, the solution for another parameter ξ is obtained bỹ Thus, the final form does not satisfy the PDE (Equation 6) for ξ any more. That is,

. . . Je reys prior
Let us modify the above PDE (Equation 6) slightly so that it is invariant under coordinate transformation. Thus, we obtain the following PDE: where g denotes the determinant of the Fisher metric. The solution, which is given as a constant times √ g, is called the Jeffreys prior [3].
It is the most famous noninformative prior in Bayesian statistics. Let π J (θ )(∝ √ g) denote the Jeffreys prior from here on. It is the straightforward extension of the uniform prior. As Jeffreys himself pointed out, it is not necessarily reasonable to adopt the Jeffreys prior as an objective prior in a higher dimensional parametric model. This is one of the reasons to propose noninformative priors under a fundamental criterion [see, e.g., Robert [4] and references therein].
Note that the following identity for the Riemannian metric tensor will be useful: . . . First moment matching prior The moment matching prior was proposed by Ghosh and Liu [17]. From the original article, we obtain a PDE in terms of information geometry. Theorem 1. Ghosh and Liu's moment matching prior is given by the solution of the following PDE: From the aforementioned form, it is clearly not a tensor equation, and thus, the PDE is not invariant under reparametrization. Indeed, while the first term of the LHS is a (0, 1) tensor, the second term is not.
Proof. First, from the formula in Ghosh and Liu [17] (Section 3, p. 193), we obtain whereθ m ML andθ m π are the MLE and the posterior mean of θ , respectively, U m = g ml ∂ l log π and V m = g ml g jk m ljk . Therefore, the condition of the first moment matching is given by Multiplying both sides with Fisher matrix g im , we obtain an equivalent equation as follows: Therefore, using Lemma 1 and Equation (8), we obtain Since we may replace √ g with π J in the last expression, we can obtain Remark 1. For the exponential family with the natural parameter θ , it is known that (e) Ŵ jk,i ≡ 0. When all connection coefficients vanish, the coordinate system is called affine. In this sense, the natural parameter is called the e-affine coordinate. From the above equation, in this parametrization, the moment matching prior agrees with the Jeffreys prior. However, if we begin with a different parametrization, then we obtain a prior which is different from the Jeffreys prior. As a specific example, let us consider the binomial model with the success probability η (0 < η < 1) in Ghosh [12] (Section 5.2, p. 199). Thereafter, the moment matching prior for η is given by π M (η) ∝ η −1 (1 − η) −1 . However, taking the natural parameter θ = log η/(1 − η) , the moment matching prior for θ is given by the Jeffreys prior, π J (θ ) ∝ e θ/2 (1 + e θ ) −1 . It is rewritten . . . Chi-square prior Liu et al. [13] developed an extension of the reference prior by replacing the KL-divergence in the original definition with the general alpha-divergence. As an exceptional case, we obtain a prior which is different from the Jeffreys prior. The PDE is given by where T i = T ijk g jk is a type (0, 1) tensor. Thus, the above PDE is a tensor equation. Its derivation and details are explained in the next section. Definition 3. [Liu et al. [13]]. If the PDE (Equation 9) has a solution, then we call the prior distribution χ 2 -prior. We denote χ 2 -prior as π χ 2 .
As we will see later, π χ 2 does not necessarily exist. However, usual statistical models satisfy a necessary and sufficient condition for the existence of π χ 2 . These models are invariant under coordinate transformation.
. Derivation of chi-square prior in terms of information geometry Liu et al. [13] derived the PDE (Equation 9) that π χ 2 should satisfy by considering the maximization of a functional of a prior π based on χ 2 -divergence. In the present section, we review their result and rewrite the functional in terms of information geometry. As a result, we obtain a more explicit form and a better interpretation of the maximization.

. . Extension of the reference prior
As an underlying principle, Bernardo [5] adopted construction of the minimax code in information theory to derive noninformative priors. After that, the noninformative prior is defined as the input source distribution that maximizes the mutual information between the parameter and the outcome. This prior is called a (Bernardo's) reference prior. Under some conditions, his idea has been strictly formulated by several authors [18, 19] (for a review, see, e.g., Berger et al. [6]).
In one of the many studies and variants of reference priors, recently Liu et al. [13] adopted the α-divergence instead of the KL-divergence in Bernardo's argument and obtained a generalized result. Definition 4. Let p(x) and q(x) be probability densities. For a fixed real parameter α, the α-divergence from p to q is defined as Remark 2. In the textbook on information geometry by Amari [20], the following parametrization is used because of the emphasis on the duality: where we write β instead of α. We adopt the parametrization of α in Equation (10). For example, χ 2 -divergence corresponds to α = −1 in Equation (10) and β = 3 in Equation (11). More explicitly, the relation α = 1−β 2 (and thus, 1 − α = 1+β 2 ) holds. When α = 0, 1, taking the limit, the α-divergence reduces to the KL-divergence. Now, let us see the definition of the noninformative prior proposed by Liu et al. [13]. Under regularity conditions (e.g., the compactness of the parameter space ), they considered the maximization of the following functional of a prior density π as follows: where E[·|θ ] denotes expectation with respect to p(X|θ ), and the expression emphasizes that the parameter θ is fixed in the integral. Under their criterion, the maximizer of J[π ] is adopted as a noninformative prior. Following Liu et al. [13], we rewrite the above functional Equation (12) in a more simple form as follows: Depending on the sign of α(1 − α), our problem reduces to maximization or minimization of the expectation E[π (θ |X) −α | θ ]. Clearly, it is not solved explicitly for general cases. Thus, as usual, we adopt the approximation of the expectation term under the assumption that X = (X 1 , . . . , X n )

. . Asymptotic expansion of the expectation term
Except for α = −1 (χ 2 -divergence), the maximization of J[π ] reduces to that of the first-order term in the following expansion (Theorem 2), which yields the Jeffreys prior for −1 < α < 1. However, for χ 2 -divergence, we need to evaluate the second-order term since the first-order term is constant.
First, we present a key result in Liu et al. [13]. Some notation in their result follows ours. For example, the Fisher information matrix and its determinant are denoted as g ij and g, respectively. The dimension of the parameter θ is denoted as p. Please refer to the original article for technical details.
where the 1/n part in braces {· · · } is given by The last term s(θ ) does not include the prior density π . From Theorem 2, for a positive constant C n and a sufficiently large n, the functional Equation (12) is approximated by When −1 < α < 1, the maximization yields π ∝ g 1 2 , that is, the Jeffreys prior. When α < −1, rather, the Jeffreys prior minimizes the functional J[π ].
However, at the boundary point α = −1 (χ 2 -divergence), the above first-order term becomes a constant independent of π . In this case, we need to evaluate the second-order term more carefully.

. . Rewriting Liu et al.'s Theorem . in geometrical terms
Now let us rewrite the second-order term of the asymptotic expansion in Theorem 2 in terms of information geometry. We fix α = −1, and from here on, consider only the case for χ 2divergence.
Although our approach differs from that in the original article, the final PDE agrees with their result. The difference and our contribution are discussed in the next subsection.
We summarize how we rewrite each term to obtain the final result (Theorem 3) later. First, we rewrite ∂ i ∂ j π π by using the following relation: After that, we replace a prior density π with the density ratio h = π/π J , where π J = √ g. The terms including the prior density π and its derivatives are expected to be written using the scalar function log h. Indeed, this expectation is correct, and we obtain the final form after tedious, lengthy, and straightforward calculation. Because we use partial integrals in transforming the original form of the asymptotic expansion, the integral symbol remains in the expression below. Theorem 3. [Liu et al. [13]], Corollary of Theorem 3.1.
where the 1/n part in square brackets is given by in which, we set T : = T i dθ i and the norm of one-form A is defined as A 2 : = A i A j g ij .
The above one-form T is called the Tchebychev form in affine geometry [see, e.g., p. 58 in Simon et al. [21]].
From Theorem 3, maximizing J[π ] over the set of all prior densities is equivalent to maximizing the above integral with respect to a scalar function h when n → ∞. Since the second and third terms inside braces {· · · } in Equation (13) are independent of h, the expression achieves the maximum if the first term vanishes, that is, holds. Thus, we obtain an equation of a differential one-form which determines χ 2 -prior. In a proper coordinate system, the component-wise form of equation (14) is given by which agrees with the original PDE (Equation 9) derived in the previous study.
Finally, we discuss the existence of χ 2 -prior. Generally, χ 2prior does not necessarily exist on a statistical model. The existence of a χ 2 -prior on a given model is equivalent to the existence of the solution of PDE (Equation 9).
A solution of PDE (Equation 9) exists if and only if T i satisfies the following condition: which is called an integrability condition and is well-known. As a simplification, we may write dT = 0. A bit surprisingly, the above condition (Equation 15) agrees with the condition that the α( = 0)-parallel prior exists [11]. This implies a certain relationship between χ 2 -prior and an α-parallel prior. Indeed, its expectation is correct and χ 2 -prior is shown to be the 1 2 -parallel prior, which is the theme in the next section.

. . Discussion
We here discuss the difference between the original result obtained by Liu et al. [13] and the present study.
First, the PDE they obtained for χ 2 -prior is not in the form of tensor equations. They gave a PDE for log π as follows ) achieves the extreme value asymptotically. They did not organize messy terms and utilized the variational method in an ad hoc manner to derive the PDE (Equation 16). Moreover, their approach does not exclude the possibility of achieving the minimum.
Our approach shows more directly that π χ 2 satisfying the PDE (Equation 16)achieves the maximum of the functional asymptotically. Using the square completion for the one-form d log h, we show that π χ 2 maximizes the functional J[π ] when n → ∞. Tanaka   . /fams. .
In addition, our underlying philosophy is the invariance principle under coordinate transformation. Clearly, the expected χ 2 -divergence from a prior to its posterior is independent of parametrization. Thus, we naturally expect that the O(n −1 ) term is independent of parametrization, i.e., represented by geometrical quantities. As a result, we obtain a simpler expression (Equation 13) in Theorem 3. This is a good example of how organizing from the viewpoint of information geometry can simplify various terms and make the structure of the problem easier to understand.
As for derivation of fundamental PDEs, we point out a formal analogy between general relativity and ours. Historically speaking, Hilbert showed that the Einstein equation is derived from Einstein-Hilbert action integral S[g ab ], where g ab is the pseudo-Riemannian metric on the time-space manifold [see, e.g., Wald [22], Appendix E.1]. In our problem, we take the expected χ 2 -divergence from a prior to its posterior instead of S[g ab ]. The maximization of J[π ] and minimization of S[g ab ] yield the tensor equation (Equation 16) and the Einstein equation, respectively.

. Relation between chi-square priors and alpha-parallel priors
In this section, we show that χ 2 -prior is the 1 2 -parallel prior, a special case of an α-parallel prior. As we shall see later, an α-parallel prior is defined through an α-parallel volume element and was proposed by Takeuchi and Amari [11]. Among several existence conditions for an α-parallel prior, we focus on the PDE of log π and rewrite it in terms of the log ratio log h.
In the exponential family, χ 2 -prior and α-parallel priors were derived by the two author groups, and Takeuchi and Amari [11] and Liu et al. [13], respectively. We also generalize this result for any α-flat model.

. . Alpha-parallel priors
Takeuchi and Amari [11] introduced a family of geometric priors called α-parallel priors, which include the well-known Jeffreys prior and maximum likelihood (ML) prior [23]. We briefly review basic definitions and related results on α-parallel priors below.

. . . Equia ne connection
First, we recall the definition of equiaffine connection in affine geometry. Let us consider a p-dimensional orientable smooth manifold M with an affine connection ∇. We shall say that a torsion-free affine connection ∇ is equiaffine when there exists a parallel volume element, that is, a nonvanishing p-form ω such that ∇ω = 0.
One necessary and sufficient condition for ∇ to be equiaffine is where R ∇ ijk l is the Riemann-Christoffel curvature tensor with respect to the connection ∇.The condition (Equation 17) is slightly weaker than the condition that an affine manifold is flat, R ∇ ijk l = 0.

. . . Definition of alpha-parallel prior
Here, we develop an aforementioned argument in statistical models. Since statistical models have a family of affine connections in a natural manner, we expect that the condition of being an equiaffine connection is obtained as a property of model manifolds rather than one of affine connections.
Let a p-dimensional statistical model manifold M be given. We assume that it is covered by a single coordinate, say, θ ∈ ⊆ R p , orientable, and simply connected.
Some examples of α-parallel prior are as follows: when α = 1, the 1-parallel prior (also called the e-parallel prior) is the so-called ML prior proposed by Hartigan [23]; and when α = 0, the 0parallel prior is the Jeffreys prior. As we shall see later, the 0-parallel prior is exceptional and always exists on a statistical model. Indeed, a 0-parallel volume element, √ gdθ 1 ∧ · · · ∧ dθ p , is known as the invariant volume element on a Riemannian manifold (M, g ij ) with the Levi-Civita connection (0-connection in information geometry). Note that an α-parallel prior could be an improper prior. For other properties of α-parallel priors, see Takeuchi and Amari [11].

. . . Existence conditions of alpha-parallel prior
In statistical models, we obtain a deeper result for the existence of an α-parallel prior. First, we note that the relation holds for every α. From the necessary and sufficient condition for the existence of α-parallel prior (Equation 17), we find that the 0parallel prior (α = 0) necessarily exists. For α = 0, we introduce the concept of statistically equiaffine.
Definition 6. A statistical model manifold M is said to be statistically equiaffine [11], when the cubic tensor T ijk satisfies the following condition: Observing the existence condition for α-parallel prior (Equation 17) and the relation (Equation 18), we easily obtain the following theorem. Note that a weaker condition (b) implies stronger conditions (a) and (c). The usual statistical models have been shown to be statistically equiaffine [11]. An important statistical model that is not statistically equiaffine is the ARMA model in time series analysis [24].
. . Chi-square prior is the half-parallel prior Now let us consider a relation between α-parallel prior and χ 2prior. To compare them, we focus on the following PDE for an α-parallel prior: [Takeuchi and Amari [11], Proposition 1, p. 1016, Equation (7)].
Since both sides of the PDE (Equation 20) are not tensors, its invariance under coordinate transformation is not clear. Thus, we introduce a one-form (geometrical quantity) derived from a scalar function h = π/π J and modify the equation.
When we set T = T i dθ i , then the above equation (21) can be rewritten as which is the equation of a differential one-form.
Proof. Using Equation (8), we rewrite the PDE (Equation 20) as follows: Therefore, using Lemma 2, we modify the RHS of Equation (22): Surprisingly, the PDE defining π χ 2 (Equation 9) agrees with Equation (21) with α = 1 2 . Thus, χ 2 -prior derived by Liu et al. [13] is the 1 2 -parallel prior. This finding is interesting in two ways. First, to Bayesian statistics, it is a new example where the formulation in terms of information geometry is useful to research on noninformative priors [for several examples, see Komaki [7] and Tanaka and Komaki [9]]. Liu et al. [13] derived the PDE (Equation 9) by considering one extension of the reference prior with χ 2divergence. Their starting point is completely independent of the geometry of statistical models. In spite of this, χ 2 -prior has a good geometrical interpretation: it is volume element invariant under the parallel transport with respect to the 1 2 -connection. Second, to information geometry, it would be the first specific example where only the 1 2 -connection makes sense in statistical applications. In information geometry, the meaning of each connection among α-connections has not been clarified enough except for specific alphas (α = 0, ±1). In Takeuchi and Amari [11], α-parallel priors were not proposed as noninformative priors. Rather, they regarded the Jeffreys prior as the 0-parallel prior and extended it to every α. Except for α = 0, only the 1 2 -parallel prior is interpreted as a noninformative prior.
. . General form of alpha-parallel priors in statistically equia ne models Let us derive a general form of α-parallel priors in statistically equiaffine models. In the following, we denote an α-parallel prior as π α . For example, π 1/2 = π χ 2 and π 0 = π J .
First, we briefly review some formulas for α-parallel priors derived by several authors [11,25]. According to Matsuzoe et al. [25], there exists a scalar function φ that satisfies T a = ∂ a φ on a statistically equiaffine model manifold. Therefore, using this function φ, a general solution of the PDE (Equation 21) is given by h(θ ) ∝ exp − α 2 φ(θ ) . Thus, we obtain α-parallel prior π α as having the following form: In the exponential family (e-flat model), Takeuchi and Amari show that α-parallel priors are representable as a power of Jeffreys prior π J for every α [Takeuchi and Amari [11], Example 2, p. 1017].
In particular, we get π α (θ ) ∝ π J (θ ) It is true only in the γ -affine coordinate system {θ i } that π α is equal to a power of the Jeffreys prior. Since the above argument is not invariant under coordinate transformation, we take the Jacobian into consideration in another coordinate.

. Conclusion
In the present study, we investigated the derivation by Lie et al. of χ 2 -prior from the viewpoint of information geometry. We showed that χ 2 -prior agrees with the 1 2 -parallel prior (α-parallel prior for α = 1 2 ), which gives a geometrical interpretation. In addition, in our formulation, using the log ratio log π/π J , which is invariant under reparametrization, simplifies a PDE defining a noninformative prior π in Bayesian analysis.

Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.