Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Appl. Math. Stat., 02 February 2026

Sec. Statistics and Probability

Volume 12 - 2026 | https://doi.org/10.3389/fams.2026.1733600

Further aspects of information-generating function of order statistics with health application in symmetry of chronic disease management


Mohamed Said MohamedMohamed Said Mohamed1Manal Al-LabadiManal Al-Labadi2Eman AlmuhurEman Almuhur3Hanan H. Sakr
Hanan H. Sakr4*
  • 1Department of Mathematics, College of Science and Humanities, Prince Sattam bin Abdulaziz, University, Hawtat Bani Tamim, Saudi Arabia
  • 2Department of Mathematics, Faculty of Arts and Sciences, University of Petra, Amman, Jordan
  • 3Department of Mathematics, Faculty of Science, Applied Science Private University, Amman, Jordan
  • 4Department of Management Information Systems, College of Business Administration in Hawtat Bani Tamim, Prince Sattam Bin Abdulaziz University, Hawtat Bani Tamim, Saudi Arabia

This investigation aimed to explore novel theoretical aspects and applications of the information-generating function measure for order statistics. We developed fundamental properties and established stochastic ordering relationships based on this information-theoretic measure. Our analysis demonstrated that when two order statistics share identical information-generating measures, their underlying parent distributions can be uniquely identified. We implemented our proposed measure to characterize the exponential distribution. Moreover, we derived bounds and investigated monotonicity properties for these functional measures. The study further examined how information-generating functions characterize distributional symmetry, with particular applications to uniform and normal distributions for identifying symmetry points of order statistics. Building on these theoretical foundations, we proposed a new symmetry test statistic derived from the information-generating properties of the order statistics. Using comprehensive Monte Carlo simulations, we evaluated the test's statistical power against existing alternatives. The present results demonstrated superior performance across various asymmetric distributional alternatives. The practical utility of our methodology is illustrated through an empirical analysis of chronic disease prevalence data.

1 Introduction and background

Several criteria have been proposed in information theory to gauge a probabilistic model's degree of uncertainty. The most significant information measurement that has been applied in several scientific and technical fields is the Shannon entropy. It started with Shannon's groundbreaking research [1], which examined how systems behaved when characterized by probability density or mass functions (pdf or pmf). Assuming that the variable X* has a pdf h(x) in the continuous case, the differential entropy, often known as the Shannon entropy, is analogously provided by

En(X*)=--h(x)ln h(x)dx.    (1)

One practical technique for assessing the variance, mean, and other moments of a probability distribution is its moment-generating function. If there are successive moments in the probability distribution, they may be found by taking the sequential derivatives of the moment-generating function at zero. To calculate information quantities like extropy, Kullback-Leibler divergence, and Shannon information, generating functions for PDFs have been defined in information theory. As long as the integral remains in existence, the information-generating function of a random variable X* was suggested by Golomb [2], who was inspired by the ideas of moments and probabilities of generating functions. It is defined as

GEnδ(X*)=E(e(δ-1)ln h(x))=-hδ(x)dx,    (2)

for any δ>0. Golomb [2] then demonstrated the following features of the information-generating function as

1. GEn1(X*)=1

2. δGEnδ(X*)|δ=1=-En(X*) (the negative of Shannon's entropy in Equation 1).

Because information-generating functions are important in information theory, several authors have recently investigated them. For a list of information-generating functions and their many features and uses, see Kharazmi and Balakrishnan [37], Zamani et al. [8], Kharazmi et al. [9], and Kayal and Balakrishnan [10].

Specifically, the information-generating function measure is simplified to GEn2(X*), sometimes referred to as the informative-energy function, when δ = 2. Using the example of kinetic energy in mechanics, Onicescu [11] introduced a discrete version of the informative-energy measurement into information theory. Bhatia [12] provides further information.

In many statistical methodologies, it is commonly assumed that the distribution of the population under study is symmetric. For example, the validity of regression models often hinges on the assumption that the residuals exhibit symmetry. This makes it critical to rigorously assess whether the symmetry assumption holds in practice. Consider that the support of the cumulative distribution function (cdf) H is denoted by SX*. Assume further that there exists a constant μ* such that for all xSX*, the equation H*x)+F*+x) = 1 is satisfied. When this condition is met, the distribution of X* is considered symmetric about the point μ*.

Symmetry is a concept of substantial theoretical and practical importance in both probability and statistics. It underpins many models and inferential procedures and has been explored extensively across various contexts. Researchers have introduced a range of characterizations for symmetric distributions, often using ordered samples such as order statistics, record values, and sequential statistics. For instance, Balakrishnan and Selvitella [13] showed that, for a sample of size m, the distributional identity Xi,m=DIXm-i+1,m holds for a fixed i = 1, …, m if and only if the underlying distribution H is symmetric about zero. In this notation, =DI signifies that the two random variables have identical distributions.

Furthermore, Ahmadi [14] introduced innovative formulations of symmetry for continuous distributions by leveraging the properties of k-record values. Building on this foundation, Mahdizadeh and Zamanzade employed ranked set sampling techniques to construct nonparametric estimators of symmetric distribution functions [15]. Broadly speaking, assessing symmetry often involves developing criteria tailored to its specific structural features. This task is frequently carried out using goodness-of-fit tests, as demonstrated by Dai et al. [16] and Bozin et al. [17].

In this study, we explore several stochastic orderings that are useful for comparing random variables in a meaningful way. Suppose X1* and X2* are two continuous random variables with pdfs h1 and h2, and corresponding cdfs H1 and H2. Their generalized inverses (also known as left-continuous quantile functions) are defined as H1-1(x)=inf{v:H1(v)x} and H2-1(x)=inf{v:H2(v)x} for 0 < x < 1.

Based on these definitions, we say that X1* is smaller than X2* in various stochastic orders if the following conditions hold for all x ≥ 0:

(1) Likelihood Ratio Order (X1*lrX2*): This ordering holds if the ratio h1(x)h2(x) is a decreasing function of x.

(2) Hazard Rate Order (X1*hrX2*): This comparison holds if the hazard rate function of X1* is greater than or equal to that of X2* for all x. That is, ΛX1*(x)ΛX2*(x).

(3) Usual Stochastic Order (X1*stX2*): This relation holds when the survival function of X1* is less than or equal to that of X2*, i.e., H¯1(x)H¯2(x).

(4) Super-Additive Order (X1*suX2*): This order applies if the composition H2-1(H1(x)) defines a super-additive function.

(5) Dispersive Order (X1*dispX2*): This ordering is satisfied if the difference H2-1(H1(x))-x increases with x.

Notably, the hazard rate function for a random variable Xi* is given by ΛXi*(v)=hi(v)1-Hi(v) for v ≥ 0, where the survival function is denoted by H¯i(v)=1-Hi(v) for i = 1, 2. For a comprehensive treatment of these stochastic orders and their properties, readers are encouraged to consult Shaked and Shanthikumar [18].

Kharazmi and Balakrishnan [6] explored the information-generating function for ordered random variables, specifically order statistics. In their study, they derived several properties of mixed systems built from independent and identically distributed components. Building on this foundation, we present a comparative analysis of mixed systems using these information metrics.

In a separate study on record values, Zamani et al. [8] investigated comparative outcomes linked to the information-generating (IG) measure. A key finding was that if two upper record value sequences share an identical IG function, the underlying distributions from which they originate must be the same. Their research also offers a rigorous characterization of the exponential distribution, demonstrating that its IG function for record values is either maximized or minimized under specific constraints.

This study aims to further explore the properties of the information-generating function for order statistics and to demonstrate its application in testing for symmetry. The remainder of the paper is structured as follows: Section 2 develops characterizations and examines monotonicity properties using ordered variables. Section 3 investigates stochastic ordering results based on the information-generating function of order statistics and establishes bounds for this measure. Section 4 analyzes the symmetric properties of the information-generating function model for order statistics, proposes a nonparametric test for symmetry, and illustrates the methodology using chronic disease management data.

2 Properties of information-generating function

In the following scenario, we will discuss some stochastic arrangements of the information-generating functional model for the entropy measure. Shaked and Shanthikumar's Theorem 4.B.2 [18] enables us to examine the following findings:

1. If X1*lrX2*, then X1*hrX2* implies X1*stX2*.

2. If X1*stX2*, then X1*suX2* implies X1*dispX2*.

Lemma 2.1. Assume that X1*dispX2*. Then the following inequality holds: GEnδ(X1*)()GEnδ(X2*) for δ ≥ 1 (respectively, 0 < δ ≤ 1).

Proof. Starting from Equation 2, we express the information-generating functional entropy as:

GEnδ(X*)=-[h(x)]δdx=01[h(H-1(v))]δ-1dv.

Given that X1*dispX2*, it follows that h1(H1-1(v))h2(H2-1(v)) holds for every v in the interval (0, 1). Consequently, we derive:

GEnδ(X1*)=01[h(H1-1(v))]δ-1dv()01[h(H2-1(v))]δ-1  dv=GEnδ(X2*),

which confirms the result for δ ≥ 1 (respectively, 0 < δ ≤ 1).

2.1 Employing ordered variables, characterizations redesigned

With cdf H and pdf h, presume that the m occurrences X1*,...,Xm* are independent and have the same distributions. Therefore, X1,m*X2,m*Xm,m* are the order of statistics of the sample. The pdf of a random sample of size m, drawn from a distribution denoted by X*, which includes the ith order statistic Xi,m* for 1 ≤ im, is expressed as:

hi,m(x)=1Δh(i,m-i+1)Hi-1(x)H¯m-i(x)h(x),    (3)

where the normalizing constant is given by Δh(i,m-i+1)=Γ(i)Γ(m-i+1)Γ(m+1). Therefore, from Equation 2, we can define the information-generating function measure for the ith order statistic Xi,m* as:

GEnδ(Xi,m*)=-hi,mδ(x)dx    =(1Δh(i,m-i+1))δ-Hδi-δ(x)H¯δm-δi(x)hδ(x)dx,    (4)

for any δ > 0, 1 ≤ im.

To support the main conclusions of this section, we refer to a corollary derived from the Stone–Weierstrass Theorem, as presented by Aliprantis and Burkinshaw [19]. This yields the following lemma:

Lemma 2.2. Let ζ* be a continuous function on the interval [0, 1]. If it satisfies the integral condition 01zmζ*(z)dz=0 for all integers m ≥ 0, then it follow that ζ*(z) = 0 for every z ∈ [0, 1].

The next theorem shows that the characteristics of the information-generating function associated with the order statistic Xi,m* uniquely identify the distribution of the parent.

Theorem 2.1. Assume that h1 and h2 are two pdfs, with corresponding cdfs H1 and H2, for the random variables X1* and X2*, respectively. Fix a value of i, with 1 ≤ im, and let δ > 0. Then the following equivalence holds:

X1*=DIX2*GEnδ(X1;i,m*)=GEnδ(X2;i,m*),mi.

Proof. We only need to establish sufficiency, since necessity is immediate. Assume that

GEnδ(X1;i,m*)=GEnδ(X2;i,m*),  mi.

Using Equations 2, 3, 4, this is equivalent to

-H1δi-δ(x)H¯1δm-δi(x)h1δ(x)dx=-H2δi-δ(x)H¯2δm-δi(x)h2δ(x)dx.    (5)

Step 1: Change of variables. Note that dH¯kδ(x)=-δH¯kδ-1(x)hk(x)dx. Rewriting Equation 5 yields

-H1δi-δ(x)H¯1δm-δi(x)ΛX1*δ-1(x)dH¯1δ(x)=-H2δi-δ(x)H¯2δm-δi(x)ΛX2*δ-1(x)dH¯2δ(x),

where ΛXk*δ-1(x)=hkδ-1(x)/H¯kδ-1(x).

Let

v=H¯kδ(x),  k=1,2.

Since H¯k is continuous and strictly decreasing, the mapping is bijective and sends x ∈ (−∞, ∞) to v ∈ [0, 1]. The identity becomes

01(1-v1/δ)δi-δvm-iΛX1*δ-1(H1-1(1-v1/δ))dv=01(1-v1/δ)δi-δvm-iΛX2*δ-1(H2-1(1-v1/δ))dv.    (6)

Step 2: Application of Lemma 2.2. Let

ζ*(v)=ΛX1*δ-1(H1-1(1-v1/δ))-ΛX2*δ-1(H2-1(1-v1/δ)).

Equation 6 implies

01(1-v1/δ)δi-δζ*(v)vldv=0,  l=m-i0.

The prefactor (1−v1/δi−δ is continuous and strictly positive for v ∈ (0, 1); hence the above is equivalent to

01vlζ*(v)dv=0,  l0.

Since ζ* is continuous, Lemma 2.2 implies

ζ*(v)=0,  v[0,1].

Therefore,

ΛX1*δ-1(H1-1(p))=ΛX2*δ-1(H2-1(p)),  p[0,1].    (7)

Step 3: Deduction of equality of the densities at corresponding quantiles. Recall that

ΛXk*δ-1(x)=hkδ-1(x)H¯kδ-1(x).

Since for the argument x=Hk-1(p) we have H¯k(x)=1-p, Equation 7 gives

h1(H1-1(p))=h2(H2-1(p)),  p[0,1].

Step 4: Equality of derivatives of inverse cdfs. Using the identity

hk(Hk-1(p))=1(Hk-1)(p),

we obtain

(H1-1)(p)=(H2-1)(p),  p(0,1).

Integrating over [0, p] yields

H1-1(p)=H2-1(p)+C,

for some constant C.

Step 5: Determination of the constant. Both inverse cdfs satisfy

limp0Hk-1(p)=inf{x:Hk(x)>0},

which is finite and equal for the two distributions, because equality of information-generating functions implies identical lower-support endpoints. Hence, the limit of the difference is zero, implying C = 0. Thus,

H1-1(p)=H2-1(p),  p[0,1].

Therefore, H1 = H2, which completes the proof.

Remark 2.1. By taking i = 1 in Theorem 2.1, we have

X1*=DIX2*GEnδ(X1;1,m*)=GEnδ(X2;1,m*),m1.

It is well established that the exponential distribution plays a significant role in reliability theory. In what follows, we present a novel characterization of this distribution.

Theorem 2.2. Let the exponential distribution be defined by H̄(x)=e-θx, where θ > 0 and x > 0. This distribution is uniquely identified by the condition

GEnδ(X1,m*)=mδ-1GEnδ(X*),m1.

With noting that δ > 0.

Proof. We first verify the forward implication, then prove the converse.

(i) If X* is exponential, then the IGF identity holds. If H̄(x)=e-θx (θ > 0), a direct computation using Equations 2, 3 (the expression for GEnδ of an order statistic and the definition of ΛX*) yields

GEnδ(X1,m*)=θδ-1mδ-1δ=mδ-1(θδ-1δ)=mδ-1GEnδ(X*),

for every integer m ≥ 1. Thus, the displayed identity holds for the exponential distribution.

(ii) Converse: the IGF identity implies an exponential parent.

Assume

GEnδ(X1,m*)=mδ-1GEnδ(X*),  m1.

Using the integral representations in Equations 2, 3, this equality can be written as

-mδH¯δm-δ(x)hδ(x)dx=mδ-1-hδ(x)dx,  m1.

Bring all terms to one side and perform the change of variable

v=H¯δ(x),  v[0,1].

As in the proof of Theorem 2.1, this substitution is admissible because H¯ is continuous and monotone on the support, and it yields, for every integer m ≥ 1,

01[1δΛX*δ-1(H-1(1-v1/δ))-GEnδ(X*)]vm-1dv=0.

Define the continuous function on [0, 1]

ζ(v):=1δΛX*δ-1(H-1(1-v1/δ))-GEnδ(X*).

The previous displayed family of equalities says that 01ζ(v)vm-1dv=0 for every integer m ≥ 1. Reindex by letting l = m−1 (so l ≥ 0) and apply Lemma 2.2; we conclude ζ(v)≡0 on [0, 1]. Hence

ΛX*δ-1(H-1(1-v1/δ))=δGEnδ(X*)  for all v[0,1].

Equivalently, with p: = 1−v1/δ ∈ [0, 1],

ΛX*(H-1(p))=:C for all p[0,1],

where C:=(δGEnδ(X*))1/(δ-1) is a positive constant. Thus the composed function ΛX*H-1 is constant on [0, 1], and therefore

ΛX*(x)=C  for all x in the (interior of the) support.

(iii) From constant ΛX* to constant hazard (and hence exponential).

We now use the explicit relation between ΛX* and the parent density/hazard given in Equation 3 of the manuscript. (Insert here the explicit formula for ΛX*(x) from Equation 3.) In the form needed below that formula expresses ΛX*(x) as a continuously differentiable function of the hazard rate

λ(x):=h(x)H¯(x).

Write this relation as

ΛX*(x)=Φ(λ(x)),

where Φ is an explicit, continuously differentiable function (determined by Equation 3). The explicit algebra in the manuscript shows that Φ is one-to-one on (0, ∞); hence, ΛX*(x)=C for all x implies λ(x) = Φ−1(C) for all x. Denote θ: = Φ−1(C) > 0. Therefore, the hazard is constant:

λ(x)=θ,  xin the support.

A distribution with constant hazard λ(x)≡θ has survival function

H¯(x)=exp(-0xλ(t)dt)=exp(-θx),

Thus, H is the exponential distribution with rate θ. Substituting C=(δGEnδ(X*))1/(δ-1) and tracing back θ = Φ−1(C) yields the explicit relation between θ and GEnδ(X*) stated in the theorem. This completes the proof.

2.2 Monotonous characteristics

Ebrahimi et al. [20], Zamani et al. [8], and other related studies have reviewed the monotonic behavior of information measures for ordered variables. This section covers the monotonic characteristics of the information-generating function of ordered statistics of order δ.

Lemma 2.3. (Adapted from Shaked and Shanthikumar [18]) Let Xi,m1* and Xj,m2* be the ith and jth order statistics drawn from independent samples of sizes m1 and m2, respectively, drawn from a distribution H with a monotone non-increasing failure rate. Then,

Xi,m1*dispXj,m2* whenever ij and m1-im2-j.

An immediate consequence of Lemma 2.3 is that if X1*,X2*,,Xm* are independent and equally distributed observations from a monotone non-increasing failure rate distribution, then for any i = 1, …, m, it holds that

Xi,m+1*dispXi,m*dispXi+1,m+1*.    (8)

Utilizing this result alongside Lemma 2.3, we can now establish the following theorem.

Theorem 2.3. (1) Suppose X* follows a distribution with a monotone non-increasing failure rate. Then for a fixed index i satisfying 1 ≤ im, the generalized entropy GEnδ(Xi,m*) increases with m.

(2) Under the same distributional assumption, for a fixed sample size m with mi ≥ 1, GEnδ(Xi,m*) decreases as i increases.

With noting that δ ∈ ℕ+.

Proof. The proof follows from Lemma 2.3 and the Equation 8.

Let us recall that a random variable X* is said to have an increasing reversed hazard rate if the function Λ~X*(x)=h(x)/H(x) is non-decreasing in x. Under this alternative assumption, we now present the reversed implications of Theorem 2.3.

Theorem 2.4. (1) If X* has an increasing reversed hazard rate, then for a fixed i within 1 ≤ im, the quantity GEnδ(Xi,m*) decreases with increasing m.

(2) Under the same condition, if m is fixed and mi ≥ 1, then GEnδ(Xi,m*) increases as i becomes larger.

With noting that δ ∈ ℕ+.

Proof. According to Equations 2, 3, we have

GEnδ(Xi,m*)GEnδ(Xi,m+1*)=(m-i+1m+1)δ-Hδi-δ(x)H¯δm-δi(x)hδ(y)dx-Hδi-δ(x)H¯δm+δ-δi(x)hδ(y)dx=J(m;i;δ)011Δh(δi,δm-δi+1)vδi-1(1-v)δm-δiΛ~X*δ-1(H-1(v))dv011Δh(δi,δm-δi+δ+1)vδi-1(1-v)δm-δi+δΛ~X*δ-1(H-1(v))dv,    (9)

where

J(m;i;δ)=(m-i+1m+1)δ·Γ(δ(m-i)+1)Γ(δ(m+1)+1)Γ(δm+1)Γ(δ(m-i+1)+1).

We introduce t = mi to simplify the notation J(m; i; δ). The gamma function terms can be rewritten using the property of the gamma function for shifted arguments:

Γ(δm+δ+1)Γ(δm+1)·Γ(δt+1)Γ(δt+δ+1)

This can be expressed as a product of terms:

(δm+1)(δm+2)(δm+δ)·1(δt+1)(δt+2)(δt+δ)

Combining these products with the initial term (m-i+1m+1)δ, we get a product over k from 1 to δ:

J(m;i;δ)=k=1δ((m-i+1)(δm+k)(m+1)(δ(m-i)+k)),    (10)

where δ ∈ ℕ+, 1 ≤ im. Substituting from Equation 10 in Equation 9, we obtain

GEnδ(Xi,m*)GEnδ(Xi,m+1*)=J(m;i;δ)011Δh(δi,δm-δi+1)vδi-1(1-v)δm-δiΛ~X*δ-1(H-1(v))dv011Δh(δi,δm-δi+δ+1)vδi-1(1-v)δm-δi+δΛ~X*δ-1(H-1(v))dv𝔼[Λ~X*δ-1(H-1(Wδi,δm*))]𝔼[Λ~X*δ-1(H-1(Wδi,δm+δ*))],    (11)

where Wi,m* represent the ith order statistic derived from a sample of size m drawn from a uniform distribution. The corresponding pdf is given by hi,m(w)=1Δh(i,m-i+1)wi-1(1-w)m-i for w ∈ [0, 1], and i = 1, 2, …, m. Shaked and Shanthikumar [18] state that Theorem 1.B.28 states that Wδi,δm*hrWδi,δm+δ*. This means that Wδi,δm*stWδi,δm+δ* is also implied. Given that δ ∈ ℕ+, the assumption leads to the inequality:

𝔼[Λ~X*δ-1(H-1(Wδi,δm*))]𝔼[Λ~X*δ-1(H-1(Wδi,δm+δ*))],

which in turn implies that GEnδ(Xi,m*)GEnδ(Xi,m+1*)1. Similarly, for Part (2), we have

GEnδ(Xi,m*)GEnδ(Xi+1,m*)=J*(m;i;δ)011Δh(δi,δm-δi+1)vδi-1(1-v)δm-δiΛ~X*δ-1(H-1(v))dv011Δh(δi+δ,δm-δi-δ+1)vδi+δ-1(1-v)δm-δi-δΛ~X*δ-1(H-1(v))dv𝔼[Λ~X*δ-1(H-1(Wδi,δm*))]𝔼[Λ~X*δ-1(H-1(Wδi+δ,δm*))],    (12)

where

J*(m;i;δ)=k=1δ(i(δ(m-i)+k)(m+i)(δi+k)),

where δ ∈ ℕ+, mi ≥ 1. Thus, GEnδ(Xi,m*)GEnδ(Xi+1,m*)1, with noting that Wδi,δm*stWδi+δ,δm*.

Theorem 2.5. (1) If X* has a decreasing reversed hazard rate, then for a fixed i within 1 ≤ im, the quantity GEnδ(Xi,m*) increases with increasing m.

(2) Under the same condition, if m is fixed and mi ≥ 1, then GEnδ(Xi,m*) decreases as i becomes larger.

With noting that δ ∈ ℕ+.

Proof. The steps are similar to those in the proof of the previous theorem.

Recalling the Pareto distribution's diminishing reversed hazard rate, represented by the CDF 1−x−α, x ≥ 1, and α > 0. With rising m and increasing i, respectively, for the Pareto distribution and δ = 2, 3, Figures 1, 2 illustrate the information-generating function model of Xi,m*, which guarantees the monotonous qualities of Theorem 2.5 when δ ∈ ℕ+.

Figure 1
Two charts illustrate Pareto distributions with different parameters. The left chart, labeled “Pareto distribution, δ=2,” shows a linearly increasing pattern. The right chart, labeled “Pareto distribution, δ=3,” exhibits a curve that rises more steeply, indicating a greater rate of increase. Both graphs plot the function GEN6(Xi*,m) against the variable m.

Figure 1. Information-generating function of X4,m* for the Pareto distribution (with parameter α = 2), with increasing m and δ = 2, 3.

Figure 2
Two graphs display Pareto distributions with different delta values. The left graph shows a delta of two, with higher initial values that gradually decrease. The right graph shows a delta of three, with values decreasing more sharply, approaching zero faster. Both x-axes range from zero to sixty.

Figure 2. Information-generating function of Xi,60* for a Pareto distribution (with parameter α = 2), with increasing i and δ = 2, 3.

3 Ordering outcomes based on the information-generating function of order statistics

In this section, we present some stochastic comparison results for the information-generating function measure of order statistics. The information-generating function of order statistics can be rewritten as follows lemma.

Lemma 3.1. The information-generating function measure of the ith order statistics, Xi,m*, can be written as

GEnδ(Xi,m*)=Δh(δi-δ+1,δm-δ+1)(Δh(i,m-i+1))δ𝔼[hδ-1(H-1(V*))],    (13)

where the random variable V* has the pdf

hV*(v)=1Δh(δi-δ+1,δm-δ+1)vδi-δ(1-v)δm-δi,    (14)

v ∈ [0, 1].

Proof. From Equations 2, 3, and making use of the transformation v = H(x), we can express the information-generating function measure of the ith order statistics, Xi,m*, as

GEnδ(Xi,m*)=Δh(δi-δ+1,δm-δ+1)Δh(δi-δ+1,δm-δ+1)(Δh(i,m-i+1))δ     01vδi-δ(1-v)δm-δihδ-1(H-1(V*))dv,

and the result follows.

The impact of monotonic transformations on the information-generating function measure of order statistics is examined in the following theorem.

Theorem 3.1. Assume that φ is a strictly increasing function satisfying φ(−∞) = 0 and φ(∞) = ∞. Then, for the ith order statistic of the transformed random variable Y* = φ(X*), the information-generating function measure is expressed as

GEnδ(Yi,m*)=Δh(δi-δ+1,δm-δ+1)(Δh(i,m-i+1))δ𝔼[h(H-1(V*))φ(H-1(V*))]δ-1,

where V* denotes a random variable whose pdf is defined in Equation 14.

Proof. Given the transformation Y* = φ(X*), the cdf and pdf of Y* become F(y) = H−1(y)) and f(y)=h(φ-1(y))φ(φ-1(y)), respectively. Using the definition of the information-generating function for the ith order statistic, along with the substitutions x = φ−1(y) and v = H(x), we derive

GEnδ(Yi,m*)=1(Δh(i,m-i+1))δ-Hδi-δ(φ-1(y))     H̄δm-δi(φ-1(y))[h(φ-1(y))φ(φ-1(y))]δdy.

Next, using the change of variables x = φ−1(y), we obtain

GEnδ(Yi,m*)=1(Δh(i,m-i+1))δ-Hδi-δ(x)    H̄δm-δi(x)[hδ(x)(φ(x))δ-1]dx    =Δh(δi-δ+1,δm-δ+1)(Δh(i,m-i+1))δ𝔼[h(H-1(V*))φ(H-1(V*))]δ-1.

Theorem 3.2. Let X* be a random variable with pdf h, and let φ be a strictly increasing and convex function satisfying φ(0) = 0 and φ(x) → ∞ as x → ∞. Assume further that φ′(x) exists, is non-decreasing, and fulfills the condition φ′(0) ≥ 1. Then:

(1) If δ ≥ 1, then

GEnδ(φ(Xi,m*))GEnδ(Xi,m*).

(2) If 0 < δ ≤ 1, then

GEnδ(φ(Xi,m*))GEnδ(Xi,m*).

Proof. Since φ is convex and strictly increasing, its derivative φ′(x) is non-decreasing and satisfies

φ(x)φ(0)1,  x0.

Let Y = φ(X*). By a standard change-of-variable argument, the pdf of Y is given by

hY(φ(x))=h(x)φ(x)h(x),

because φ′(x) ≥ 1.

Equation 2.3 gives the IGF representation

GEnδ(X*)=𝔼[ΛX*δ-1(H-1(V*))],

and therefore

ΛY(φ(x))=h(x)φ(x)h(x)=ΛX*(x).

When δ ≥ 1, the function uuδ−1 is increasing, which implies

ΛYδ-1(φ(x))ΛX*δ-1(x).

For 0 < δ ≤ 1, the same function is decreasing, hence

ΛYδ-1(φ(x))ΛX*δ-1(x).

Lemma 3.1 together with Theorem 3.1 ensures that these inequalities carry over to the IGF evaluated at the order statistic Xi,m*. Consequently:

- If δ ≥ 1, then

GEnδ(φ(Xi,m*))GEnδ(Xi,m*).

- If 0 < δ ≤ 1, then

GEnδ(φ(Xi,m*))GEnδ(Xi,m*).

This completes the proof.

Remark 3.1. The additional requirement φ′(0) ≥ 1 is not intended to restrict the class of admissible convex transformations. Its role is to ensure that the map φ does not locally contract the distribution near the origin. Since the IGF involves powers of the hazard function; such a contraction would reverse the direction of the inequalities in Theorem 3.2. The condition φ′(0) ≥ 1 is therefore a convenient and sufficient way to guarantee that

Λφ(X*)(φ(x))=h(x)φ(x)h(x)=ΛX*(x),

which is the key step in applying Lemma 3.1 and Theorem 3.1. We note that this assumption may be relaxed to φ′(x) ≥ 1 on a neighborhood of the origin, without altering the main results. In this sense, the condition is mild and does not significantly reduce the applicability of the theorem.

The information-generating function measurements associated with the ith order statistics of two continuously generated random variables are compared as follows. Theorem 3.B.26 by Shaked and Shanthikumar [18] states that X1;i,m*dispX2;i,m*, if X1*dispX2*, where i = 1, 2, ..., m. Therefore, using Lemma 2.1, we can easily get the following conclusion.

Proposition 3.1. Assume that X1*dispX2*. Then, it holds that GEnδ(X1;i,m*)()GEnδ(X2;i,m*) for δ ≥ 1 (respectively, 0 < δ ≤ 1).

Proof. From Lemma 2.1 and Equation 13, let X1*dispX2*. Then, for any δ ≥ 1 (0 < δ ≤ 1), we have

𝔼[h1δ-1(H1-1(V*))]()E[h2δ-1(H2-1(V*))],

and the result follows.

The following theorem compares the information-generating functions of related ith order statistics by measuring the information-generating functions of two variables.

Theorem 3.3. Consider two continuous random variables, X1* and X2*, associated with cdfs H1 and H2, and corresponding pdfs h1 and h2. Suppose that the condition infΨ1*supΨ2* holds, where

Ψ1*={v*>0|h2(H2-1(v*))h1(H1-1(v*))1},Ψ2*={v*>0|h2(H2-1(v*))h1(H1-1(v*))>1}.

Then, the following statements are true:

(1) If 0 < δ ≤ 1 and GEnδ(X1*)GEnδ(X2*), then it follows that GEnδ(X1,m*)GEnδ(X2,m*).

(2) If δ ≥ 1 and GEnδ(X1*)GEnδ(X2*), then it follows that GEnδ(X1,m*)GEnδ(X2,m*).

Proof. When either of the sets Ψ1* or Ψ2* is empty, the conclusion holds trivially. Therefore, we assume both sets are non-empty. Given the assumption that GEnδ(X1*)GEnδ(X2*), we can write

-h1δ(x)dx--h2δ(x)dx     =01[h1δ-1(H1-1(v))-h2δ-1(H2-1(v))]dv0.

Since 0 ≤ (1−v) ≤ 1 for v ∈ [0, 1], it follows that

01(1-v)δ(m-i)[h1δ-1(H1-1(v))-h2δ-1(H2-1(v))]dv0,    (15)

where mi ≥ 0 for i = 1, 2, …, m.

Now, applying the definition of the information-generating function of the i-th order statistic, we obtain

GEnδ(X1;i,m*)-GEnδ(X2;i,m*)=-h1;i,mδ(x)dx--h2;i,mδ(x)dx=Ω(x),

where we define

Ω(x)=-h1;i,mδ(x)dx--h2;i,mδ(x)dx.

To verify the first part of the theorem, it suffices to show that Ω(x) ≤ 0. Using the substitution v = Hi(x) for i = 1, 2, we rewrite Ω(x) as follows:

(Δh(i,m-i+1))δΩ(x)=01vδi-δ(1-v)δm-δi[h1δ-1(H1-1(v))          -h2δ-1(H2-1(v))]dv          =Ψ1*vδi-δ(1-v)δm-δi[h1δ-1(H1-1(v))          -h2δ-1(H2-1(v))]dv          +Ψ2*vδi-δ(1-v)δm-δi[h1δ-1(H1-1(v))          -h2δ-1(H2-1(v))]dv.

From the given condition infΨ1*supΨ2* and the boundedness of vδ(i−1) on [0, 1], we obtain:

(Δh(i,m-i+1))δΩ(x)(infΨ1*)δ(i-1)Ψ1*(1-v)δ(m-i)[h1δ-1(H1-1(v))-h2δ-1(H2-1(v))]dv+(supΨ2*)δ(i-1)Ψ2*(1-v)δ(m-i)[h1δ-1(H1-1(v))-h2δ-1(H2-1(v))]dv(infΨ1*)δ(i-1)01(1-v)δ(m-i)[h1δ-1(H1-1(v))-h2δ-1(H2-1(v))]dv0.

The last inequality follows directly from Equation 15 and the assumption that infΨ1*supΨ2*. A similar argument can be applied to prove the second part.

3.1 Bounds for information-generating function measure of order statistics

Theorem 3.4. Let X* be a random variable with cdf H and pdf h. If Md*=f(m)<, where md*=sup{x:h(x)Md*} is the mode of X*, then

GEnδ(Xi,m*)max{(Md*)δ-1(Δh(i,m-i+1))δD*(δi;δm),..(i-1)δi-δ(m-i)δm-δi(m-1)δm-δ(Δh(i,m-i+1))δGEnδ(X*)}    (16)

where D*(δi;δm)=01vδi-δ(1-v)δm-δidv, and under the condition δ ≥ 1.

Proof. From Equations 2, 3, and under the condition δ ≥ 1, we can use the transformation v = H(x) to express the information-generating function measure for the ith order statistics, Xi,m*, as

GEnδ(Xi,m*)=1(Δh(i,m-i+1))δ01vδi-δ(1-v)δm-δihδ-1      (H-1(v))dv.

Given h(x)Md*, it follows that

GEnδ(Xi,m*)(Md*)δ-1(Δh(i,m-i+1))δD*(δi;δm).

Conversely, since the beta distribution with pdf 1Δh(i,m-i+1)01vi-1(1-v)m-idv has the mode i-1m-1, we can say that

GEnδ(Xi,m*)1(Δh(i,m-i+1))δ(i-1m-1)δi-δ    (1-i-1m-1)δm-δi01hδ-1(H-1(v))dv    =(i-1)δi-δ(m-i)δm-δi(m-1)δm-δ(Δh(i,m-i+1))δGEnδ(X*).

Example 3.1. Suppose X* follows a Pareto distribution with pdf given by h(x)=αsαxα+1,xs>0, α>0. It can be shown that the transformed density becomes h(H-1(v))=αs(1-v)α+1α,0<v<1. Taking α = 1 and s = 1, we find Md*=1, and hence, h(H−1(v)) = (1−v)2. Furthermore, we compute

GEnδ(X*)=1x-2δdx=12δ-1,δ>12.

According to Theorem 3.4, we deduce that

GEnδ(Xi,m*)max{D*(δi;δm)(Δh(i,m-i+1))δ,     (i-1)δi-δ(m-i)δm-δi(2δ-1)(m-1)δm-δ(Δh(i,m-i+1))δ}.

Letting m = 20 and i = 15, for δ = 3, we evaluate

GEn3(X15,20*)max{9.83131,13.6021}=13.6021.

4 Information-generating function model symmetric features of the order statistics

A number of interesting features of the information-generating function of order statistics appear when the pdf of the underlying system, aside from the independent distributed random variables, is symmetric. We begin with two lemmas, the proof of which follows immediately from the symmetry assumption and the definition of hi, m in Equation 3.

Lemma 4.1. (Fashandi and Ahmadi [21]) Let X* be a continual random variable defined over the support SX**, with pdf h and cdf H. If the following condition holds:

h(H-1(v))=h(H-1(1-v)), for all v(0,1),

then the cdf H(x) is symmetric with respect to some point c*SX**.

Lemma 4.2. (Balakrishnan and Selvitella [13]) Suppose the order statistic Xi,m*, for i = 1, …, m, arises from a distribution whose pdf h satisfies the symmetry condition h*+x) = h*x) for x ≥ 0, where μ* denotes the mean of X*. Under this assumption, the following identities are satisfied:

H(μ*+x)=H¯(μ*-x),hi,m(μ*+x)=hm-i+1,m(μ*-x).

Theorem 4.1. Assume that X1*,,Xm* are iid samples drawn from a distribution with pdf h that is symmetric about its mean μ*. Then, the following properties hold:

1. If the sample size m is odd, then for every i = 1, …, m,

GEnδ(Xi,m*)=GEnδ(Xm-i+1,m*).

2. The pdf h is symmetric (about some point) if and only if

GEnδ(X1,m*)=GEnδ(Xm,m*) for all integers m1.

Moreover, if the first moment exists, the center of symmetry equals the mean μ*.

Proof. (1) (Symmetry implies equality of GEn for reflected order statistics). By Lemma 4.2, we have the pointwise identity

hi,m(μ*+x)=hm-i+1,m(μ*-x),  x.

Using this identity and the substitution y = μ*+x (whose Jacobian is dy = dx), we obtain

GEnδ(Xi,m*)=-hi,mδ(y)dy=-hi,mδ(μ*+x)dx     =-(hm-i+1,m(μ*-x))δdx     =-hm-i+1,mδ(t)dt=GEnδ(Xm-i+1,m*),

where in the penultimate equality we used the change of variable t = μ*x. This proves (1).

(2) (Necessity). Part (1) with i = 1 gives immediately GEnδ(X1,m*)=GEnδ(Xm,m*) for all odd m. Because the identity for all m ≥ 1 is stronger, necessity is immediate.

(Sufficiency). Assume that

GEnδ(X1,m*)=GEnδ(Xm,m*)  for every m1.    (17)

Using the representations of GEnδ and proceeding exactly as in the proof of Theorem 2.1, the Equation 17 yields, after the standard change of variable v=H¯δ(x) and grouping factors, an identity of the form

01w(v)[h(H-1(v))-h(H-1(1-v))]vdv=0  for all 0,

where w(v) = (1−v1/δi−δ is continuous and strictly positive on (0, 1). Dividing by w(v) and using the continuity of the integrand, we obtain

01v[h(H-1(v))-h(H-1(1-v))]dv=0  for all 0.

By Lemma 2.2 (Stone–Weierstrass corollary), the continuous function

ζ(v):=h(H-1(v))-h(H-1(1-v))

must vanish identically on [0, 1]; hence

h(H-1(v))=h(H-1(1-v)),  v(0,1).    (18)

Now Lemma 4.1 (Fashandi and Ahmadi) implies that the cdf H is symmetric about some point c* ∈ ℝ (that is, H(c*+x) = 1−H(c*x) for all x). Consequently h is symmetric about c*.

To identify the center c* with the mean μ*, note that for any distribution symmetric about c* with a finite first moment, we necessarily have

𝔼[X]=c*.

Therefore, when the first moment exists, the center of symmetry equals the mean, and the pdf is symmetric about μ*. This completes the proof of sufficiency and hence of the theorem.

Corollary 4.1. As a direct consequence of Theorem 4.1, let the forward difference operator with respect to i be defined as ΞGEnδ(Xi,m*)=GEnδ(Xi+1,m*)-GEnδ(Xi,m*) for 1 ≤ im−1. Then, it follows that ΞGEnδ(Xi,m*)=-ΞGEnδ(Xm-i,m*) for i = 1, …, m−1.

Remark 4.1. Define Θm=GEnδ(X1,m*)-GEnδ(Xm,m*). Then, Θm = 0 if and only if X* is symmetric. Hence, Θm serves as a potential measure of symmetry and can be used as a test statistic for assessing symmetry.

Based on the conditions outlined in Corollary 4.1, the information-generating function GEnδ(Xi,m*) attains either a local maximum or a minimum at the median position. This behavior can be illustrated using the uniform U(−1, 1) and standard normal N(0, 1) distributions. Specifically, for the median case (i = 4) when the sample size is m = 7, we can observe (refer to Figures 3, 4):

(1) Under the U(−1, 1) distribution, the function reaches a minimum value of 3.263403 for δ = 2, 5.940808 for δ = 3, and 11.36502 for δ = 4.

(2) Under the N(0, 1) distribution the function reaches a maximum value of 0.6147224 for δ = 2, 0.43858655 for δ = 3, and 0.33141763 for δ = 4.

Figure 3
Three line graphs showing parabolic curves for different values of delta. Top left: delta equals two, values range from four to seven. Top right: delta equals three, values range from six to thirty. Bottom: delta equals four, values range from ten to two hundred. Each graph has i as the x-axis.

Figure 3. Information-generating function of the ith order statistics of U(−1, 1) distribution.

Figure 4
Three line graphs show data distributions for different delta values. The first graph with delta equals two peaks at 0.6. The second graph with delta equals three peaks at 0.4. The third graph with delta equals four peaks slightly above 0.3. Each graph features a horizontal axis labeled “i” and distinct vertical scales.

Figure 4. Information-generating function of the ith order statistics of N(0, 1) distribution.

4.1 Symmetry test using nonparametric estimation

Nonparametric approaches to testing symmetry have been extensively explored in the literature; notable contributions include those by Xiong et al. [22], Noughabi and Jarrahiferiz [23], and Mohamed and Almuqrin [24]. In this section, we focus on a nonparametric estimation framework for the information-generating function inspired by the methodology proposed by Vasicek [25]. This formulation is then employed to assess symmetry in a distribution. Consider a random sample X1*,,Xm* drawn from a continuous distribution H(x) with associated density function h(x). The hypothesis under investigation is:

Hy0:H(μ*-x)=1-H(μ*+x), for all x.

where the parameter μ* is unspecified. The alternative hypothesis is expressed as:

Hy1:H(μ*-x)1-H(μ*+x).

When the underlying random variables are equally distributed and independent and have a symmetric pdf, the information-generating function derived from their order statistics exhibits several notable properties. The Vasicek entropy estimator, originally introduced in Equation 1, has been instrumental in the progression of statistical analysis techniques. Its formulation is given by:

En(hm)=--h(x)ln h(x)dx=-01ln[(ddχH-1(χ))-1]dχ=1mi=1mln [m2u*(Xi+u*,m*-Xi-u*,m*)],    (19)

Here, u* is a positive integer satisfying u*<m2. For boundary handling, the values are extended such that Xi = X1 when i < 1, and Xi = Xm when i>m. The generalized entropy expressions for the smallest and largest ordered statistics can be reformulated as follows:

GEnδ(X1,m*)=01[m(1-u)m-1]δfδ-1(F-1(y))du,
GEnδ(Xm,m*)=01[mum-1]δfδ-1(F-1(y))du.

Park [26], expanding on the foundation laid by Vasicek [25], proposed a test for symmetry based on entropy derived from order statistics. Following this approach, sample-based estimators of GEnδ(X1,m*) and GEnδ(Xm,m*) for a sample size m and k = 1, 2, …, can be expressed as:

GEnδ(X1,k*)^=kδmi=1m(1-im+1)kδ-δ(2u*m(Xi+u*,m*-Xi-u*,m*))δ-1,
GEnδ(Xk,k*)^=kδmi=1m(im+1)kδ-δ(2u*m(Xi+u*,m*-Xi-u*,m*))δ-1.

Accordingly, the expression Θk^=GEnδ(X1,k*)^-GEnδ(Xk,k*)^, defined for k = 1, 2, …, can be approximated through the following empirical estimator:

Θk^=kδmi=1m(2u*m(Xi+u*,m*-Xi-u*,m*))δ-1[(1-im+1)kδ-δ-(im+1)kδ-δ].

To simplify the analysis, we fix k = 2 in what follows and suggest employing the estimator:

Θ2^=2δmi=1m(2u*m(Xi+u*,m*-Xi-u*,m*))δ-1[(1-im+1)δ-(im+1)δ]=2δmi=1m(2u*m(Xi+u*,m*-Xi-u*,m*))δ-1Φ(im+1),

in which Φ(v) = −Φ(1−v), and Φ(v) is both continuous and limited. This estimator corresponds to Θ2=GEnδ(X1,2*)-GEnδ(X2,2*) and is utilized to evaluate whether the distribution of the random variable X* is symmetric. Substantial deviations of Θ2, whether in a positive or negative direction, can be interpreted as evidence of asymmetry in the underlying distribution.

Theorem 4.2. Let X1*,,Xm* be an equally distributed and independent random variables, and define Yi*=aXi*+b* for constants a > 0 and b* ∈ ℝ, for each i = 1, …, m. Denote the estimators of Θ2 based on the sequences {Xi*} and {Yi*} as Θ2^X* and Θ2^Y*, respectively. Then, the following relationships (expectation, variance, and mean square error, respectively) hold:

(1) 𝔼[Θ2^Y*]=𝔼[Θ2^X*]aδ-1,

(2) Var[Θ2^Y*]=Var[Θ2^X*]a2δ-2,

(3) MSE[Θ2^Y*]=MSE[Θ2^X*]a2δ-2.

Proof. We begin by expressing the estimator for Θ2 based on the transformed variables:

Θ2^Y=2δmi=1m(2u*m(Yi+u*,m*-Yi-u*,m*))δ-1[(1-im+1)δ-(im+1)δ]=2δmi=1m(2u*m(aXi+u*,m*-aXi-u*,m*))δ-1[(1-im+1)δ-(im+1)δ].

This transformation directly leads to the stated scaling properties, completing the proof.

However, the estimator Θ2^ depends not only on the observed sample, but also varies with the chosen window size u*. Determining its exact distribution under the null hypothesis presents significant analytical challenges. Consequently, Monte Carlo simulation is used to estimate the critical values. Following prior studies (e.g., McWilliams [27] and Corzo and Babativa [28]), the generalized lambda distribution is selected as an alternative model. From this distribution, samples of sizes m = 20, 30, 50, and 100 are generated across nine different parameter settings. The simulated data are defined as

xi=η1+viη3-(1-vi)η4η2,0vi1,i=1,2,,m.

Table 1 presents the parameter values η1, η2, η3, and η4, originally chosen by McWilliams [27]. For each parameter combination, 1,000 samples are produced for each sample size. To determine the optimal u*, we utilize a heuristic formula suggested by Crzcgorzewski and Wirczorkowski [29] for entropy estimation, given by

u*=[m+0.5],    (20)

where [·] denotes the floor function. Figure 5 illustrates the empirical distributions of the test statistic Θ2^, based on 10,000 replications from the standard normal distribution. These distributions are shown for sample sizes m = 25, 40, 50, 70, and 100, with u* selected via Equation 20. Sample generation and computation of the test statistic were performed using Wolfram Mathematica (version 13), chosen for its efficient random number generation and symbolic computation features. Further statistical analysis and visualization were conducted in R, leveraging its advanced capabilities in statistical computing and graphical presentation. In Figure 5, as the sample size m increases, the empirical pdf of the statistic Θ2^ becomes increasingly concentrated around its central value. Specifically, larger sample sizes yield steeper and more sharply peaked curves, reflecting a reduction in variability due to the greater amount of information contained in the sample. Conversely, smaller sample sizes produce flatter, more dispersed distributions, indicating greater variability. This behavior is consistent with the general principles of asymptotic theory, where statistics based on larger samples tend to exhibit reduced variance and greater stability.

Table 1
www.frontiersin.org

Table 1. Parameter configurations of the generalized lambda distribution used in the Monte Carlo simulations, categorized into nine distinct cases.

Figure 5
Two line graphs depict empirical PDFs of \(\hat{\Theta}_2\) for different sample sizes: 25, 40, 50, 70, and 100. Both graphs show density versus \(\hat{\Theta}_2\), with density peaking at zero and broader curves for smaller sample sizes.

Figure 5. Empirical density plots of the test statistic based on 50,000 samples generated under the null distribution for sample sizes m = 25, 40, 50, 70, and 100, with δ = 2 (top panel) and δ = 2 (bottom panel).

Using a 1,000-reiteration Monte Carlo simulation, Table 2 presents the exact critical quantities of the examined statistic Θ2^ for varying sample sizes, which correspond to the statistically significant level α* = 0.05. According to Table 2, we observe that the value of zero lies within the critical intervals as both m and δ increase. Furthermore, the length of these intervals decreases significantly, converging closely around zero.

Table 2
www.frontiersin.org

Table 2. The test statistic's critical intervals Θ2^ at the significance level of 0.05.

Furthermore, the power of the test is calculated as the percentage of the 1,000 samples in the important range that reject the symmetrical null assumption at the level of significance α* = 0.05. The expected power levels for the proposed test are shown in Table 3.

Table 3
www.frontiersin.org

Table 3. Comparative analysis of the power examination for the test at the 0.05 significance threshold.

The determination of the critical values and the power for our proposed symmetry test at a significance level of α* = 0.05 was carried out as follows:

(1) Generate a random sample of size m from the standard normal distribution, and then calculate the corresponding test statistic for the sample;

(2) Repeat Step 1 a total of 1,000 times and define the critical values based on the 25th and 975th percentiles of the obtained test statistics (that is, the 25th and 975th ordered statistics, Θ2(25)^ and Θ2(975)^, are used to set the thresholds. Specifically, the critical values are given by Θ2α*=0.05^=Θ2(975)^ and Θ2α*=0.05^=Θ2(975)^, considering that for α* = 0.05, we have α*2=0.025=251,000 and 1-α*2=0.975=9751,000: Hence, the null hypothesis is rejected if Θ2^ falls below Θ2(25)^ or exceeds Θ2(975)^, and accepted otherwise when Θ2(25)^<Θ2^<Θ2(975)^);

(3) Draw another sample of size m under the null distribution, then verify whether the absolute value of the test statistic crosses the critical thresholds;

(4) Estimate the test's power as the proportion of rejections over 1,000 repetitions of Step 3.

4.1.1 Performance assessment using Monte Carlo methods

To rigorously evaluate the proposed testing methodology, we implement Monte Carlo simulation techniques. The comparative analysis examines statistical power across multiple competing tests, with detailed results presented in Tables 3, 4.

Table 4
www.frontiersin.org

Table 4. Comparison of the test's power analysis at the significance level 0.05.

4.1.1.1 Comparative test procedures

The study incorporates the following established testing approaches for benchmarking purposes:

1. McWilliams' runs-based examination [27] utilizes the counting measure At(1) as its fundamental test statistic, quantifying total sequence runs.

2. Baklizi's modified runs analysis [30] introduces an adjusted formulation of the runs test, operationalized through statistic At(2).

3. Signed-Rank Wilcoxon procedure [31], developed by Gibbons and Chakraborti, employs the test measure At(3) for distribution-free inference.

4. Tajuddin's rank-sum approach [32] adapts the Wilcoxon two-sample framework using test statistic At(4).

5. Cheng-Balakrishnan rank methodology [33] implements the testing criterion At(5) for nonparametric analysis.

6. Modarres' trimmed statistical measure [34] incorporates a proportional trimming factor q within its test statistic Atq(6).

7. Baklizi's size-adaptive test [35] accounts for both sample dimensionality m and trimming proportion q through statistic Atm;q(7).

8. Baklizi's secondary testing framework [35] presents an alternative formulation based on At(8).

9. Baklizi's extended testing protocol [36] features an enhanced version using evaluation metric At(9).

10. Corzo-Babativa nonparametric technique [28] establishes its testing procedure on the foundation of At(10).

11. Noughabi-Jarrahiferiz extropy-based method [23] develops a novel approach using order statistic extropy measure, formalized as At(11).

A symmetric distribution is shown by Case-1 in Table 3, where we can observe that all of the values of δ have powers of the testing statistic Θ2^ that are near 0.05 as expected. The corresponding distribution is asymmetric in the next 8 examples (situations 2 and 3 are almost symmetrical). Test statistics with varying δ values, particularly as they grow, exhibit comparable powers in cases 5, 7, 8, and 9. The weakened power values in instance 4 may be explained by the fact that η1 is much larger than 0, whereas it is nearly 0 in the other examples. We may conclude that our suggested test, based on the information-generating function of order statistics, performs well in the simulation study as the values of δ increase, compared with the other tests in Table 4. Therefore, we anticipate that the suggested test will outperform the competing tests across a wide range of real-world applications.

4.2 Real data set

To demonstrate the applicability of our methodology, we used data from the health statistics bulletin published by the General Authority for Statistics in the Kingdom of Saudi Arabia. This comprehensive dataset captures key health indicators, including:

1. Prevalence of chronic diseases,

2. Mental health status.

The statistical population encompasses all households—both Saudi and non-Saudi—permanently residing in the Kingdom of Saudi Arabia. The survey covers 13 administrative regions and 151 governorates, with 2023 as the base year for calculating indicators. Health status among adults (aged 15 years and above) is assessed using the Visual Analog Scale (VAS), with scores ranging from 0 (worst possible health) to 100 (excellent health). The data is stratified by administrative region and age group, enabling detailed demographic and geographic analysis. The complete dataset is publicly available through the General Authority for Statistics portal at: https://www.stats.gov.sa/statistics-tabs?tab=436312&category=417594. Figure 6 shows visualizations of the data sets histogram and the kernel density estimates, while Figure 7 shows the Q–Q diagram.

Figure 6
Histogram with blue bars and a red kernel density estimate line. The x-axis represents data values ranging from 0 to 100, and the y-axis shows density. The distribution is unimodal and skewed right, peaking around 80.

Figure 6. The kernel density estimate and histogram of the data set.

Figure 7
Q-Q plot comparing sample quantiles to theoretical quantiles. Points largely follow the red diagonal line, indicating the data is approximately normally distributed, with some deviation at the tails.

Figure 7. Q–Q diagram of all the data.

4.2.1 Bootstrap procedure

Since the null distribution of Θ^2 is non-pivotal, we employ a reflection bootstrap:

1. Symmetrize the data by generating Xsym={X1*,,Xm*}{2X~-X1*,,2X~-Xm*}, where X~ is the sample median.

2. For each b = 1, …, B: (i) Resample Xb* uniformly from Xsym. (ii) Compute Θ^2,b*.

3. The p-value is 1Bb=1BI(Θ^2,b*Θ^2obs).

Results. The sample exhibits negative skewness (−1.89) and high kurtosis (9.55), indicating:

• A left-skewed distribution with a longer left tail

• Heavy tails and peakedness relative to a normal distribution

The symmetry test results for different sensitivity parameters δ are given in Table 5.

Table 5
www.frontiersin.org

Table 5. The symmetry test results for different sensitivity parameters δ.

Key findings:

• Strong evidence against symmetry for δ = 2 and 3 (p < 0.05).

• Marginal evidence at δ = 4 (p = 0.051).

• Insufficient evidence to reject symmetry at δ = 5 (p = 0.103).

Interpretation:

• The negative skewness suggests potential outliers in the left tail.

• The decreasing p-values with higher δ indicate the test's reduced sensitivity to asymmetry.

5 Conclusion

This research advances the theoretical understanding of information-generating functions for order statistics through several key contributions. We have systematically investigated monotonicity properties and derived bounds for the proposed measure. The study establishes important stochastic ordering results based on this information-theoretic framework, demonstrating that equality of information-generating function measures for order statistics uniquely determines their parent distributions. Furthermore, we have developed novel characterization theorems for the exponential distribution using this approach. For symmetric distributions, our analysis shows that the information-generating function GEnδ(Xi,m*) exhibits extremal behavior (either a local maximum or a minimum) at the median position. This theoretical finding is substantiated through explicit computations for both uniform and standard normal distributions. Building on these theoretical insights, we have formulated a nonparametric symmetry test based on the proposed measure, whose effectiveness increases with δ. The practical utility of our methodology is validated through comprehensive simulation studies and an application to chronic disease management data. Both theoretical and empirical results consistently show that higher values of δ significantly improve the test's performance, confirming the robustness of our approach.

Future studies will include a comprehensive performance comparison with a broader set of established symmetry tests, such as the Baringhaus–Henze, Ahmad–Li, and Bonett–Seier tests, to further situate our method within the broader literature. While this study provides initial validation of the test's power, a more comprehensive investigation against a wider array of alternatives, including heavy-tailed and bounded-support distributions, is a priority for future research.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author contributions

MM: Writing – original draft, Investigation, Software, Formal analysis, Funding acquisition, Visualization, Resources, Supervision, Validation, Project administration, Conceptualization, Writing – review & editing, Data curation, Methodology. MA-L: Writing – review & editing, Methodology, Formal analysis. EA: Writing – review & editing, Methodology, Formal analysis. HS: Funding acquisition, Data curation, Visualization, Resources, Conceptualization, Formal analysis, Validation, Project administration, Methodology, Writing – review & editing, Software, Investigation, Writing – original draft, Supervision.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This work was funded by Prince Sattam bin Abdulaziz University (PSAU/2025/02/35141).

Acknowledgments

The authors extend their appreciation to Prince Sattam bin Abdulaziz University for funding this research work through the project number (PSAU/2025/02/35141).

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Shannon C. A mathematical theory of communication. Bell Syst Tech J. (1948) 27:379–423. doi: 10.1002/j.1538-7305.1948.tb01338.x

Crossref Full Text | Google Scholar

2. Golomb S. The information generating function of a probability distribution (corresp.). IEEE Trans Inform Theory. (1966) 12:75–7. doi: 10.1109/TIT.1966.1053843

Crossref Full Text | Google Scholar

3. Kharazmi O, Balakrishnan N. Cumulative and relative cumulative residual information generating measures and associated properties. Commun Stat Theory Methods. (2021) 52:5260–73. doi: 10.1080/03610926.2021.2005100

Crossref Full Text | Google Scholar

4. Kharazmi O, Balakrishnan N. Cumulative residual and relative cumulative residual Fisher information and their properties. IEEE Trans Inf Theory. (2021) 67:6306–12. doi: 10.1109/TIT.2021.3073789

Crossref Full Text | Google Scholar

5. Kharazmi O, Balakrishnan N. Jensen-information generating function and its connections to some well-known information measures. Stat Prob Lett. (2021) 170:108995. doi: 10.1016/j.spl.2020.108995

Crossref Full Text | Google Scholar

6. Kharazmi O, Balakrishnan N. Information generating function for order statistics and mixed reliability systems. Commu Stat Theory Methods. (2021) 51:7846–55. doi: 10.1080/03610926.2021.1881123

Crossref Full Text | Google Scholar

7. Kharazmi O, Balakrishnan N. Generating function for generalized Fisher information measure and its application to finite mixture models. Hacet J Math Stati. (2022) 51:1472–83. doi: 10.15672/hujms.1094273

Crossref Full Text | Google Scholar

8. Zamani Z, Kharazmi O, Balakrishnan N. Information generating function of record values. Math Methods Stat. (2022) 31:120–33. doi: 10.3103/S1066530722030036

Crossref Full Text | Google Scholar

9. Kharazmi O, Balakrishnan N, Ozonur D. Jensen-discrete information generating function with an application to image processing. Soft Comput. (2023) 27:4543–52. doi: 10.1007/s00500-023-07863-0

Crossref Full Text | Google Scholar

10. Kayal S, Balakrishnan N. Quantile-based information generating functions and their properties and uses. Prob Eng Inf Sci. (2024) 38:1–19. doi: 10.1017/S0269964824000068

Crossref Full Text | Google Scholar

11. Onicescu O. The Informational Energy, Component of the Statistical Barometer Concerning the Systems. Bucharest: Technical Publishing House (1966).

Google Scholar

12. Bhatia PK. On measures of information energy. Inf Sci. (1997) 97:233–40. doi: 10.1016/0020-0255(94)00071-9

Crossref Full Text | Google Scholar

13. Balakrishnan N, Selvitella A. Symmetry of a distribution via symmetry of order statistics. Stat Probabil Lett. (2017) 129:367–72. doi: 10.1016/j.spl.2017.06.023

Crossref Full Text | Google Scholar

14. Ahmadi J. Characterization results for symmetric continuous distributions based on the properties of k-records and spacings. Stat Probabil Lett. (2020) 162:108764. doi: 10.1016/j.spl.2020.108764

Crossref Full Text | Google Scholar

15. Mahdizadeh M, Zamanzade E. Estimation of a symmetric distribution function in multistage ranked set sampling. Stat Papers. (2020) 61:851–67. doi: 10.1007/s00362-017-0965-x

Crossref Full Text | Google Scholar

16. Dai XJ, Niu CZ, Guo X. Testing for central symmetry and inference of the unknown center. Comput Stat Data An. (2018) 127:15–31. doi: 10.1016/j.csda.2018.05.007

Crossref Full Text | Google Scholar

17. Bozin V, Milosevic B, Nikitin YY, Obradovic M. New characterization-based symmetry tests. Bull Malays Math Sci Soc. (2020) 43:297–320. doi: 10.1007/s40840-018-0680-3

Crossref Full Text | Google Scholar

18. Shaked M, Shanthikumar JG. Stochastic Orders and Their Applications. San Diego, CA: Academic Press (1994).

Google Scholar

19. Aliprantis CD, Burkinshaw O. Principles of Real Analysis. London: Edward Arnold (1981).

Google Scholar

20. Ebrahimi N, Soofi ES, Zahedi H. Information properties of order statistics and spacings. IEEE Trans Inform Theory. (2004) 50:177–83. doi: 10.1109/TIT.2003.821973

Crossref Full Text | Google Scholar

21. Fashandi M, Ahmadi J. Characterizations of symmetric distributions based on Renyi entropy. Stat Probabil Lett. (2012) 82:798–804. doi: 10.1016/j.spl.2012.01.004

Crossref Full Text | Google Scholar

22. Xiong PH, Zhuang WW, Qiu GX. Testing symmetry based on the extropy of record values. J Nonparametr Stat. (2021) 33:134–55. doi: 10.1080/10485252.2021.1914338

Crossref Full Text | Google Scholar

23. Noughabi HA, Jarrahiferiz J. Extropy of order statistics applied to testing symmetry. Commun Stat-Simul C. (2022) 51:3389–99. doi: 10.1080/03610918.2020.1714660

Crossref Full Text | Google Scholar

24. Mohamed MS, Almuqrin MA. Properties of fractional generalized entropy in ordered variables and symmetry testing. AIMS Math. (2025) 10:1116–41. doi: 10.3934/math.2025053

Crossref Full Text | Google Scholar

25. Vasicek O. A test for normality based on sample entropy. J R Stat Soc B. (1976) 38:54–9. doi: 10.1111/j.2517-6161.1976.tb01566.x

Crossref Full Text | Google Scholar

26. Park S. A goodness-of-fit test for normality based on the sample entropy of order statistics. Stat Probabil Lett. (1999) 44:359–63. doi: 10.1016/S0167-7152(99)00027-9

Crossref Full Text | Google Scholar

27. McWilliams TP. A distribution-free test for symmetry based on a runs statistic. J Am Stat Assoc. (1990) 85:1130–3. doi: 10.1080/01621459.1990.10474985

Crossref Full Text | Google Scholar

28. Corzo J, Babativa G. A modified runs test for symmetry. J Stat Comput Sim. (2013) 83:984–91. doi: 10.1080/00949655.2011.647026

Crossref Full Text | Google Scholar

29. Crzcgorzewski P, Wirczorkowski R. Entropy-based goodness-of-fit test for exponentiality. Commun Stat-Theor M. (1999) 28:1183–202. doi: 10.1080/03610929908832351

Crossref Full Text | Google Scholar

30. Baklizi A. A conditional distribution runs test for symmetry. J Nonparametr Stat. (2003) 15:713–8. doi: 10.1080/10485250310001634737

Crossref Full Text | Google Scholar

31. Gibbons JD, Chakraborti SM. Non-Parametric Statistical Inference. New York, NY: Dekker (1992).

Google Scholar

32. Tajuddin IH. Distribution-free test for symmetry based on the Wilcoxon two-sample test. J Appl Stat. (1994) 21:409–15. doi: 10.1080/757584017

Crossref Full Text | Google Scholar

33. Cheng WH, Balakrishnan N. A modified sign test for symmetry. Commun Stat-Simul C. (2004) 33:703–9. doi: 10.1081/SAC-200033302

Crossref Full Text | Google Scholar

34. Modarres R, Gastwirth JL. A modified runs test for symmetry. Stat Probabil Lett. (1996) 31:107–12. doi: 10.1016/S0167-7152(96)00020-X

Crossref Full Text | Google Scholar

35. Baklizi A. Testing symmetry using a trimmed longest run statistic. Aust N Z J Stat. (2007) 49:339–47. doi: 10.1111/j.1467-842X.2007.00485.x

Crossref Full Text | Google Scholar

36. Baklizi A. Improving the power of the hybrid test. Int J Contemp Math Sciences. (2008) 3:497–9.

Google Scholar

Keywords: information-generating function, non-parametric estimation, order statistics, stochastic order comparison, symmetry testing

Citation: Mohamed MS, Al-Labadi M, Almuhur E and Sakr HH (2026) Further aspects of information-generating function of order statistics with health application in symmetry of chronic disease management. Front. Appl. Math. Stat. 12:1733600. doi: 10.3389/fams.2026.1733600

Received: 27 October 2025; Revised: 24 December 2025; Accepted: 05 January 2026;
Published: 02 February 2026.

Edited by:

Han-Ying Liang, Tongji University, China

Reviewed by:

Zakariya Yahya Algamal, University of Mosul, Iraq
Shuji Ando, Tokyo University of Science, Japan

Copyright © 2026 Mohamed, Al-Labadi, Almuhur and Sakr. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Hanan H. Sakr, aC5zYWtyQHBzYXUuZWR1LnNh

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.