- 1Department of Mathematics, College of Science and Humanities, Prince Sattam bin Abdulaziz, University, Hawtat Bani Tamim, Saudi Arabia
- 2Department of Mathematics, Faculty of Arts and Sciences, University of Petra, Amman, Jordan
- 3Department of Mathematics, Faculty of Science, Applied Science Private University, Amman, Jordan
- 4Department of Management Information Systems, College of Business Administration in Hawtat Bani Tamim, Prince Sattam Bin Abdulaziz University, Hawtat Bani Tamim, Saudi Arabia
This investigation aimed to explore novel theoretical aspects and applications of the information-generating function measure for order statistics. We developed fundamental properties and established stochastic ordering relationships based on this information-theoretic measure. Our analysis demonstrated that when two order statistics share identical information-generating measures, their underlying parent distributions can be uniquely identified. We implemented our proposed measure to characterize the exponential distribution. Moreover, we derived bounds and investigated monotonicity properties for these functional measures. The study further examined how information-generating functions characterize distributional symmetry, with particular applications to uniform and normal distributions for identifying symmetry points of order statistics. Building on these theoretical foundations, we proposed a new symmetry test statistic derived from the information-generating properties of the order statistics. Using comprehensive Monte Carlo simulations, we evaluated the test's statistical power against existing alternatives. The present results demonstrated superior performance across various asymmetric distributional alternatives. The practical utility of our methodology is illustrated through an empirical analysis of chronic disease prevalence data.
1 Introduction and background
Several criteria have been proposed in information theory to gauge a probabilistic model's degree of uncertainty. The most significant information measurement that has been applied in several scientific and technical fields is the Shannon entropy. It started with Shannon's groundbreaking research [1], which examined how systems behaved when characterized by probability density or mass functions (pdf or pmf). Assuming that the variable X* has a pdf h(x) in the continuous case, the differential entropy, often known as the Shannon entropy, is analogously provided by
One practical technique for assessing the variance, mean, and other moments of a probability distribution is its moment-generating function. If there are successive moments in the probability distribution, they may be found by taking the sequential derivatives of the moment-generating function at zero. To calculate information quantities like extropy, Kullback-Leibler divergence, and Shannon information, generating functions for PDFs have been defined in information theory. As long as the integral remains in existence, the information-generating function of a random variable X* was suggested by Golomb [2], who was inspired by the ideas of moments and probabilities of generating functions. It is defined as
for any δ>0. Golomb [2] then demonstrated the following features of the information-generating function as
1.
2. (the negative of Shannon's entropy in Equation 1).
Because information-generating functions are important in information theory, several authors have recently investigated them. For a list of information-generating functions and their many features and uses, see Kharazmi and Balakrishnan [3–7], Zamani et al. [8], Kharazmi et al. [9], and Kayal and Balakrishnan [10].
Specifically, the information-generating function measure is simplified to , sometimes referred to as the informative-energy function, when δ = 2. Using the example of kinetic energy in mechanics, Onicescu [11] introduced a discrete version of the informative-energy measurement into information theory. Bhatia [12] provides further information.
In many statistical methodologies, it is commonly assumed that the distribution of the population under study is symmetric. For example, the validity of regression models often hinges on the assumption that the residuals exhibit symmetry. This makes it critical to rigorously assess whether the symmetry assumption holds in practice. Consider that the support of the cumulative distribution function (cdf) H is denoted by . Assume further that there exists a constant μ* such that for all , the equation H(μ*−x)+F(μ*+x) = 1 is satisfied. When this condition is met, the distribution of X* is considered symmetric about the point μ*.
Symmetry is a concept of substantial theoretical and practical importance in both probability and statistics. It underpins many models and inferential procedures and has been explored extensively across various contexts. Researchers have introduced a range of characterizations for symmetric distributions, often using ordered samples such as order statistics, record values, and sequential statistics. For instance, Balakrishnan and Selvitella [13] showed that, for a sample of size m, the distributional identity holds for a fixed i = 1, …, m if and only if the underlying distribution H is symmetric about zero. In this notation, signifies that the two random variables have identical distributions.
Furthermore, Ahmadi [14] introduced innovative formulations of symmetry for continuous distributions by leveraging the properties of k-record values. Building on this foundation, Mahdizadeh and Zamanzade employed ranked set sampling techniques to construct nonparametric estimators of symmetric distribution functions [15]. Broadly speaking, assessing symmetry often involves developing criteria tailored to its specific structural features. This task is frequently carried out using goodness-of-fit tests, as demonstrated by Dai et al. [16] and Bozin et al. [17].
In this study, we explore several stochastic orderings that are useful for comparing random variables in a meaningful way. Suppose and are two continuous random variables with pdfs h1 and h2, and corresponding cdfs H1 and H2. Their generalized inverses (also known as left-continuous quantile functions) are defined as and for 0 < x < 1.
Based on these definitions, we say that is smaller than in various stochastic orders if the following conditions hold for all x ≥ 0:
(1) Likelihood Ratio Order (): This ordering holds if the ratio is a decreasing function of x.
(2) Hazard Rate Order (): This comparison holds if the hazard rate function of is greater than or equal to that of for all x. That is, .
(3) Usual Stochastic Order (): This relation holds when the survival function of is less than or equal to that of , i.e., .
(4) Super-Additive Order (): This order applies if the composition defines a super-additive function.
(5) Dispersive Order (): This ordering is satisfied if the difference increases with x.
Notably, the hazard rate function for a random variable is given by for v ≥ 0, where the survival function is denoted by for i = 1, 2. For a comprehensive treatment of these stochastic orders and their properties, readers are encouraged to consult Shaked and Shanthikumar [18].
Kharazmi and Balakrishnan [6] explored the information-generating function for ordered random variables, specifically order statistics. In their study, they derived several properties of mixed systems built from independent and identically distributed components. Building on this foundation, we present a comparative analysis of mixed systems using these information metrics.
In a separate study on record values, Zamani et al. [8] investigated comparative outcomes linked to the information-generating (IG) measure. A key finding was that if two upper record value sequences share an identical IG function, the underlying distributions from which they originate must be the same. Their research also offers a rigorous characterization of the exponential distribution, demonstrating that its IG function for record values is either maximized or minimized under specific constraints.
This study aims to further explore the properties of the information-generating function for order statistics and to demonstrate its application in testing for symmetry. The remainder of the paper is structured as follows: Section 2 develops characterizations and examines monotonicity properties using ordered variables. Section 3 investigates stochastic ordering results based on the information-generating function of order statistics and establishes bounds for this measure. Section 4 analyzes the symmetric properties of the information-generating function model for order statistics, proposes a nonparametric test for symmetry, and illustrates the methodology using chronic disease management data.
2 Properties of information-generating function
In the following scenario, we will discuss some stochastic arrangements of the information-generating functional model for the entropy measure. Shaked and Shanthikumar's Theorem 4.B.2 [18] enables us to examine the following findings:
1. If , then implies .
2. If , then implies .
Lemma 2.1. Assume that . Then the following inequality holds: for δ ≥ 1 (respectively, 0 < δ ≤ 1).
Proof. Starting from Equation 2, we express the information-generating functional entropy as:
Given that , it follows that holds for every v in the interval (0, 1). Consequently, we derive:
which confirms the result for δ ≥ 1 (respectively, 0 < δ ≤ 1).
2.1 Employing ordered variables, characterizations redesigned
With cdf H and pdf h, presume that the m occurrences are independent and have the same distributions. Therefore, are the order of statistics of the sample. The pdf of a random sample of size m, drawn from a distribution denoted by X*, which includes the ith order statistic for 1 ≤ i ≤ m, is expressed as:
where the normalizing constant is given by . Therefore, from Equation 2, we can define the information-generating function measure for the ith order statistic as:
for any δ > 0, 1 ≤ i ≤ m.
To support the main conclusions of this section, we refer to a corollary derived from the Stone–Weierstrass Theorem, as presented by Aliprantis and Burkinshaw [19]. This yields the following lemma:
Lemma 2.2. Let ζ* be a continuous function on the interval [0, 1]. If it satisfies the integral condition for all integers m ≥ 0, then it follow that ζ*(z) = 0 for every z ∈ [0, 1].
The next theorem shows that the characteristics of the information-generating function associated with the order statistic uniquely identify the distribution of the parent.
Theorem 2.1. Assume that h1 and h2 are two pdfs, with corresponding cdfs H1 and H2, for the random variables and , respectively. Fix a value of i, with 1 ≤ i ≤ m, and let δ > 0. Then the following equivalence holds:
Proof. We only need to establish sufficiency, since necessity is immediate. Assume that
Using Equations 2, 3, 4, this is equivalent to
Step 1: Change of variables. Note that . Rewriting Equation 5 yields
where .
Let
Since is continuous and strictly decreasing, the mapping is bijective and sends x ∈ (−∞, ∞) to v ∈ [0, 1]. The identity becomes
Step 2: Application of Lemma 2.2. Let
Equation 6 implies
The prefactor (1−v1/δ)δi−δ is continuous and strictly positive for v ∈ (0, 1); hence the above is equivalent to
Since ζ* is continuous, Lemma 2.2 implies
Therefore,
Step 3: Deduction of equality of the densities at corresponding quantiles. Recall that
Since for the argument we have , Equation 7 gives
Step 4: Equality of derivatives of inverse cdfs. Using the identity
we obtain
Integrating over [0, p] yields
for some constant C.
Step 5: Determination of the constant. Both inverse cdfs satisfy
which is finite and equal for the two distributions, because equality of information-generating functions implies identical lower-support endpoints. Hence, the limit of the difference is zero, implying C = 0. Thus,
Therefore, H1 = H2, which completes the proof.
Remark 2.1. By taking i = 1 in Theorem 2.1, we have
It is well established that the exponential distribution plays a significant role in reliability theory. In what follows, we present a novel characterization of this distribution.
Theorem 2.2. Let the exponential distribution be defined by , where θ > 0 and x > 0. This distribution is uniquely identified by the condition
With noting that δ > 0.
Proof. We first verify the forward implication, then prove the converse.
(i) If X* is exponential, then the IGF identity holds. If (θ > 0), a direct computation using Equations 2, 3 (the expression for GEnδ of an order statistic and the definition of ) yields
for every integer m ≥ 1. Thus, the displayed identity holds for the exponential distribution.
(ii) Converse: the IGF identity implies an exponential parent.
Assume
Using the integral representations in Equations 2, 3, this equality can be written as
Bring all terms to one side and perform the change of variable
As in the proof of Theorem 2.1, this substitution is admissible because is continuous and monotone on the support, and it yields, for every integer m ≥ 1,
Define the continuous function on [0, 1]
The previous displayed family of equalities says that for every integer m ≥ 1. Reindex by letting l = m−1 (so l ≥ 0) and apply Lemma 2.2; we conclude ζ(v)≡0 on [0, 1]. Hence
Equivalently, with p: = 1−v1/δ ∈ [0, 1],
where is a positive constant. Thus the composed function is constant on [0, 1], and therefore
(iii) From constant to constant hazard (and hence exponential).
We now use the explicit relation between and the parent density/hazard given in Equation 3 of the manuscript. (Insert here the explicit formula for from Equation 3.) In the form needed below that formula expresses as a continuously differentiable function of the hazard rate
Write this relation as
where Φ is an explicit, continuously differentiable function (determined by Equation 3). The explicit algebra in the manuscript shows that Φ is one-to-one on (0, ∞); hence, for all x implies λ(x) = Φ−1(C) for all x. Denote θ: = Φ−1(C) > 0. Therefore, the hazard is constant:
A distribution with constant hazard λ(x)≡θ has survival function
Thus, H is the exponential distribution with rate θ. Substituting and tracing back θ = Φ−1(C) yields the explicit relation between θ and stated in the theorem. This completes the proof.
2.2 Monotonous characteristics
Ebrahimi et al. [20], Zamani et al. [8], and other related studies have reviewed the monotonic behavior of information measures for ordered variables. This section covers the monotonic characteristics of the information-generating function of ordered statistics of order δ.
Lemma 2.3. (Adapted from Shaked and Shanthikumar [18]) Let and be the ith and jth order statistics drawn from independent samples of sizes m1 and m2, respectively, drawn from a distribution H with a monotone non-increasing failure rate. Then,
An immediate consequence of Lemma 2.3 is that if are independent and equally distributed observations from a monotone non-increasing failure rate distribution, then for any i = 1, …, m, it holds that
Utilizing this result alongside Lemma 2.3, we can now establish the following theorem.
Theorem 2.3. (1) Suppose X* follows a distribution with a monotone non-increasing failure rate. Then for a fixed index i satisfying 1 ≤ i ≤ m, the generalized entropy increases with m.
(2) Under the same distributional assumption, for a fixed sample size m with m≥i ≥ 1, decreases as i increases.
With noting that δ ∈ ℕ+.
Proof. The proof follows from Lemma 2.3 and the Equation 8.
Let us recall that a random variable X* is said to have an increasing reversed hazard rate if the function is non-decreasing in x. Under this alternative assumption, we now present the reversed implications of Theorem 2.3.
Theorem 2.4. (1) If X* has an increasing reversed hazard rate, then for a fixed i within 1 ≤ i ≤ m, the quantity decreases with increasing m.
(2) Under the same condition, if m is fixed and m≥i ≥ 1, then increases as i becomes larger.
With noting that δ ∈ ℕ+.
Proof. According to Equations 2, 3, we have
where
We introduce t = m−i to simplify the notation J(m; i; δ). The gamma function terms can be rewritten using the property of the gamma function for shifted arguments:
This can be expressed as a product of terms:
Combining these products with the initial term , we get a product over k from 1 to δ:
where δ ∈ ℕ+, 1 ≤ i ≤ m. Substituting from Equation 10 in Equation 9, we obtain
where represent the ith order statistic derived from a sample of size m drawn from a uniform distribution. The corresponding pdf is given by for w ∈ [0, 1], and i = 1, 2, …, m. Shaked and Shanthikumar [18] state that Theorem 1.B.28 states that . This means that is also implied. Given that δ ∈ ℕ+, the assumption leads to the inequality:
which in turn implies that . Similarly, for Part (2), we have
where
where δ ∈ ℕ+, m≥i ≥ 1. Thus, , with noting that .
Theorem 2.5. (1) If X* has a decreasing reversed hazard rate, then for a fixed i within 1 ≤ i ≤ m, the quantity increases with increasing m.
(2) Under the same condition, if m is fixed and m≥i ≥ 1, then decreases as i becomes larger.
With noting that δ ∈ ℕ+.
Proof. The steps are similar to those in the proof of the previous theorem.
Recalling the Pareto distribution's diminishing reversed hazard rate, represented by the CDF 1−x−α, x ≥ 1, and α > 0. With rising m and increasing i, respectively, for the Pareto distribution and δ = 2, 3, Figures 1, 2 illustrate the information-generating function model of , which guarantees the monotonous qualities of Theorem 2.5 when δ ∈ ℕ+.
Figure 1. Information-generating function of for the Pareto distribution (with parameter α = 2), with increasing m and δ = 2, 3.
Figure 2. Information-generating function of for a Pareto distribution (with parameter α = 2), with increasing i and δ = 2, 3.
3 Ordering outcomes based on the information-generating function of order statistics
In this section, we present some stochastic comparison results for the information-generating function measure of order statistics. The information-generating function of order statistics can be rewritten as follows lemma.
Lemma 3.1. The information-generating function measure of the ith order statistics, , can be written as
where the random variable V* has the pdf
v ∈ [0, 1].
Proof. From Equations 2, 3, and making use of the transformation v = H(x), we can express the information-generating function measure of the ith order statistics, , as
and the result follows.
The impact of monotonic transformations on the information-generating function measure of order statistics is examined in the following theorem.
Theorem 3.1. Assume that φ is a strictly increasing function satisfying φ(−∞) = 0 and φ(∞) = ∞. Then, for the ith order statistic of the transformed random variable Y* = φ(X*), the information-generating function measure is expressed as
where V* denotes a random variable whose pdf is defined in Equation 14.
Proof. Given the transformation Y* = φ(X*), the cdf and pdf of Y* become F(y) = H(φ−1(y)) and , respectively. Using the definition of the information-generating function for the ith order statistic, along with the substitutions x = φ−1(y) and v = H(x), we derive
Next, using the change of variables x = φ−1(y), we obtain
Theorem 3.2. Let X* be a random variable with pdf h, and let φ be a strictly increasing and convex function satisfying φ(0) = 0 and φ(x) → ∞ as x → ∞. Assume further that φ′(x) exists, is non-decreasing, and fulfills the condition φ′(0) ≥ 1. Then:
(1) If δ ≥ 1, then
(2) If 0 < δ ≤ 1, then
Proof. Since φ is convex and strictly increasing, its derivative φ′(x) is non-decreasing and satisfies
Let Y = φ(X*). By a standard change-of-variable argument, the pdf of Y is given by
because φ′(x) ≥ 1.
Equation 2.3 gives the IGF representation
and therefore
When δ ≥ 1, the function u↦uδ−1 is increasing, which implies
For 0 < δ ≤ 1, the same function is decreasing, hence
Lemma 3.1 together with Theorem 3.1 ensures that these inequalities carry over to the IGF evaluated at the order statistic . Consequently:
- If δ ≥ 1, then
- If 0 < δ ≤ 1, then
This completes the proof.
Remark 3.1. The additional requirement φ′(0) ≥ 1 is not intended to restrict the class of admissible convex transformations. Its role is to ensure that the map φ does not locally contract the distribution near the origin. Since the IGF involves powers of the hazard function; such a contraction would reverse the direction of the inequalities in Theorem 3.2. The condition φ′(0) ≥ 1 is therefore a convenient and sufficient way to guarantee that
which is the key step in applying Lemma 3.1 and Theorem 3.1. We note that this assumption may be relaxed to φ′(x) ≥ 1 on a neighborhood of the origin, without altering the main results. In this sense, the condition is mild and does not significantly reduce the applicability of the theorem.
The information-generating function measurements associated with the ith order statistics of two continuously generated random variables are compared as follows. Theorem 3.B.26 by Shaked and Shanthikumar [18] states that , if , where i = 1, 2, ..., m. Therefore, using Lemma 2.1, we can easily get the following conclusion.
Proposition 3.1. Assume that . Then, it holds that for δ ≥ 1 (respectively, 0 < δ ≤ 1).
Proof. From Lemma 2.1 and Equation 13, let . Then, for any δ ≥ 1 (0 < δ ≤ 1), we have
and the result follows.
The following theorem compares the information-generating functions of related ith order statistics by measuring the information-generating functions of two variables.
Theorem 3.3. Consider two continuous random variables, and , associated with cdfs H1 and H2, and corresponding pdfs h1 and h2. Suppose that the condition holds, where
Then, the following statements are true:
(1) If 0 < δ ≤ 1 and , then it follows that .
(2) If δ ≥ 1 and , then it follows that .
Proof. When either of the sets or is empty, the conclusion holds trivially. Therefore, we assume both sets are non-empty. Given the assumption that , we can write
Since 0 ≤ (1−v) ≤ 1 for v ∈ [0, 1], it follows that
where m−i ≥ 0 for i = 1, 2, …, m.
Now, applying the definition of the information-generating function of the i-th order statistic, we obtain
where we define
To verify the first part of the theorem, it suffices to show that Ω(x) ≤ 0. Using the substitution v = Hi(x) for i = 1, 2, we rewrite Ω(x) as follows:
From the given condition and the boundedness of vδ(i−1) on [0, 1], we obtain:
The last inequality follows directly from Equation 15 and the assumption that . A similar argument can be applied to prove the second part.
3.1 Bounds for information-generating function measure of order statistics
Theorem 3.4. Let X* be a random variable with cdf H and pdf h. If , where is the mode of X*, then
where , and under the condition δ ≥ 1.
Proof. From Equations 2, 3, and under the condition δ ≥ 1, we can use the transformation v = H(x) to express the information-generating function measure for the ith order statistics, , as
Given , it follows that
Conversely, since the beta distribution with pdf has the mode , we can say that
Example 3.1. Suppose X* follows a Pareto distribution with pdf given by . It can be shown that the transformed density becomes . Taking α = 1 and s = 1, we find , and hence, h(H−1(v)) = (1−v)2. Furthermore, we compute
According to Theorem 3.4, we deduce that
Letting m = 20 and i = 15, for δ = 3, we evaluate
4 Information-generating function model symmetric features of the order statistics
A number of interesting features of the information-generating function of order statistics appear when the pdf of the underlying system, aside from the independent distributed random variables, is symmetric. We begin with two lemmas, the proof of which follows immediately from the symmetry assumption and the definition of hi, m in Equation 3.
Lemma 4.1. (Fashandi and Ahmadi [21]) Let X* be a continual random variable defined over the support , with pdf h and cdf H. If the following condition holds:
then the cdf H(x) is symmetric with respect to some point .
Lemma 4.2. (Balakrishnan and Selvitella [13]) Suppose the order statistic , for i = 1, …, m, arises from a distribution whose pdf h satisfies the symmetry condition h(μ*+x) = h(μ*−x) for x ≥ 0, where μ* denotes the mean of X*. Under this assumption, the following identities are satisfied:
Theorem 4.1. Assume that are iid samples drawn from a distribution with pdf h that is symmetric about its mean μ*. Then, the following properties hold:
1. If the sample size m is odd, then for every i = 1, …, m,
2. The pdf h is symmetric (about some point) if and only if
Moreover, if the first moment exists, the center of symmetry equals the mean μ*.
Proof. (1) (Symmetry implies equality of GEn for reflected order statistics). By Lemma 4.2, we have the pointwise identity
Using this identity and the substitution y = μ*+x (whose Jacobian is dy = dx), we obtain
where in the penultimate equality we used the change of variable t = μ*−x. This proves (1).
(2) (Necessity). Part (1) with i = 1 gives immediately for all odd m. Because the identity for all m ≥ 1 is stronger, necessity is immediate.
(Sufficiency). Assume that
Using the representations of GEnδ and proceeding exactly as in the proof of Theorem 2.1, the Equation 17 yields, after the standard change of variable and grouping factors, an identity of the form
where w(v) = (1−v1/δ)δi−δ is continuous and strictly positive on (0, 1). Dividing by w(v) and using the continuity of the integrand, we obtain
By Lemma 2.2 (Stone–Weierstrass corollary), the continuous function
must vanish identically on [0, 1]; hence
Now Lemma 4.1 (Fashandi and Ahmadi) implies that the cdf H is symmetric about some point c* ∈ ℝ (that is, H(c*+x) = 1−H(c*−x) for all x). Consequently h is symmetric about c*.
To identify the center c* with the mean μ*, note that for any distribution symmetric about c* with a finite first moment, we necessarily have
Therefore, when the first moment exists, the center of symmetry equals the mean, and the pdf is symmetric about μ*. This completes the proof of sufficiency and hence of the theorem.
Corollary 4.1. As a direct consequence of Theorem 4.1, let the forward difference operator with respect to i be defined as for 1 ≤ i ≤ m−1. Then, it follows that for i = 1, …, m−1.
Remark 4.1. Define . Then, Θm = 0 if and only if X* is symmetric. Hence, Θm serves as a potential measure of symmetry and can be used as a test statistic for assessing symmetry.
Based on the conditions outlined in Corollary 4.1, the information-generating function attains either a local maximum or a minimum at the median position. This behavior can be illustrated using the uniform U(−1, 1) and standard normal N(0, 1) distributions. Specifically, for the median case (i = 4) when the sample size is m = 7, we can observe (refer to Figures 3, 4):
(1) Under the U(−1, 1) distribution, the function reaches a minimum value of 3.263403 for δ = 2, 5.940808 for δ = 3, and 11.36502 for δ = 4.
(2) Under the N(0, 1) distribution the function reaches a maximum value of 0.6147224 for δ = 2, 0.43858655 for δ = 3, and 0.33141763 for δ = 4.
4.1 Symmetry test using nonparametric estimation
Nonparametric approaches to testing symmetry have been extensively explored in the literature; notable contributions include those by Xiong et al. [22], Noughabi and Jarrahiferiz [23], and Mohamed and Almuqrin [24]. In this section, we focus on a nonparametric estimation framework for the information-generating function inspired by the methodology proposed by Vasicek [25]. This formulation is then employed to assess symmetry in a distribution. Consider a random sample drawn from a continuous distribution H(x) with associated density function h(x). The hypothesis under investigation is:
where the parameter μ* is unspecified. The alternative hypothesis is expressed as:
When the underlying random variables are equally distributed and independent and have a symmetric pdf, the information-generating function derived from their order statistics exhibits several notable properties. The Vasicek entropy estimator, originally introduced in Equation 1, has been instrumental in the progression of statistical analysis techniques. Its formulation is given by:
Here, u* is a positive integer satisfying . For boundary handling, the values are extended such that Xi = X1 when i < 1, and Xi = Xm when i>m. The generalized entropy expressions for the smallest and largest ordered statistics can be reformulated as follows:
Park [26], expanding on the foundation laid by Vasicek [25], proposed a test for symmetry based on entropy derived from order statistics. Following this approach, sample-based estimators of and for a sample size m and k = 1, 2, …, can be expressed as:
Accordingly, the expression , defined for k = 1, 2, …, can be approximated through the following empirical estimator:
To simplify the analysis, we fix k = 2 in what follows and suggest employing the estimator:
in which Φ(v) = −Φ(1−v), and Φ(v) is both continuous and limited. This estimator corresponds to and is utilized to evaluate whether the distribution of the random variable X* is symmetric. Substantial deviations of Θ2, whether in a positive or negative direction, can be interpreted as evidence of asymmetry in the underlying distribution.
Theorem 4.2. Let be an equally distributed and independent random variables, and define for constants a > 0 and b* ∈ ℝ, for each i = 1, …, m. Denote the estimators of Θ2 based on the sequences and as and , respectively. Then, the following relationships (expectation, variance, and mean square error, respectively) hold:
(1) ,
(2) ,
(3) .
Proof. We begin by expressing the estimator for Θ2 based on the transformed variables:
This transformation directly leads to the stated scaling properties, completing the proof.
However, the estimator depends not only on the observed sample, but also varies with the chosen window size u*. Determining its exact distribution under the null hypothesis presents significant analytical challenges. Consequently, Monte Carlo simulation is used to estimate the critical values. Following prior studies (e.g., McWilliams [27] and Corzo and Babativa [28]), the generalized lambda distribution is selected as an alternative model. From this distribution, samples of sizes m = 20, 30, 50, and 100 are generated across nine different parameter settings. The simulated data are defined as
Table 1 presents the parameter values η1, η2, η3, and η4, originally chosen by McWilliams [27]. For each parameter combination, 1,000 samples are produced for each sample size. To determine the optimal u*, we utilize a heuristic formula suggested by Crzcgorzewski and Wirczorkowski [29] for entropy estimation, given by
where [·] denotes the floor function. Figure 5 illustrates the empirical distributions of the test statistic , based on 10,000 replications from the standard normal distribution. These distributions are shown for sample sizes m = 25, 40, 50, 70, and 100, with u* selected via Equation 20. Sample generation and computation of the test statistic were performed using Wolfram Mathematica (version 13), chosen for its efficient random number generation and symbolic computation features. Further statistical analysis and visualization were conducted in R, leveraging its advanced capabilities in statistical computing and graphical presentation. In Figure 5, as the sample size m increases, the empirical pdf of the statistic becomes increasingly concentrated around its central value. Specifically, larger sample sizes yield steeper and more sharply peaked curves, reflecting a reduction in variability due to the greater amount of information contained in the sample. Conversely, smaller sample sizes produce flatter, more dispersed distributions, indicating greater variability. This behavior is consistent with the general principles of asymptotic theory, where statistics based on larger samples tend to exhibit reduced variance and greater stability.
Table 1. Parameter configurations of the generalized lambda distribution used in the Monte Carlo simulations, categorized into nine distinct cases.
Figure 5. Empirical density plots of the test statistic based on 50,000 samples generated under the null distribution for sample sizes m = 25, 40, 50, 70, and 100, with δ = 2 (top panel) and δ = 2 (bottom panel).
Using a 1,000-reiteration Monte Carlo simulation, Table 2 presents the exact critical quantities of the examined statistic for varying sample sizes, which correspond to the statistically significant level α* = 0.05. According to Table 2, we observe that the value of zero lies within the critical intervals as both m and δ increase. Furthermore, the length of these intervals decreases significantly, converging closely around zero.
Furthermore, the power of the test is calculated as the percentage of the 1,000 samples in the important range that reject the symmetrical null assumption at the level of significance α* = 0.05. The expected power levels for the proposed test are shown in Table 3.
Table 3. Comparative analysis of the power examination for the test at the 0.05 significance threshold.
The determination of the critical values and the power for our proposed symmetry test at a significance level of α* = 0.05 was carried out as follows:
(1) Generate a random sample of size m from the standard normal distribution, and then calculate the corresponding test statistic for the sample;
(2) Repeat Step 1 a total of 1,000 times and define the critical values based on the 25th and 975th percentiles of the obtained test statistics (that is, the 25th and 975th ordered statistics, and , are used to set the thresholds. Specifically, the critical values are given by and , considering that for α* = 0.05, we have and : Hence, the null hypothesis is rejected if falls below or exceeds , and accepted otherwise when );
(3) Draw another sample of size m under the null distribution, then verify whether the absolute value of the test statistic crosses the critical thresholds;
(4) Estimate the test's power as the proportion of rejections over 1,000 repetitions of Step 3.
4.1.1 Performance assessment using Monte Carlo methods
To rigorously evaluate the proposed testing methodology, we implement Monte Carlo simulation techniques. The comparative analysis examines statistical power across multiple competing tests, with detailed results presented in Tables 3, 4.
4.1.1.1 Comparative test procedures
The study incorporates the following established testing approaches for benchmarking purposes:
1. McWilliams' runs-based examination [27] utilizes the counting measure At(1) as its fundamental test statistic, quantifying total sequence runs.
2. Baklizi's modified runs analysis [30] introduces an adjusted formulation of the runs test, operationalized through statistic At(2).
3. Signed-Rank Wilcoxon procedure [31], developed by Gibbons and Chakraborti, employs the test measure At(3) for distribution-free inference.
4. Tajuddin's rank-sum approach [32] adapts the Wilcoxon two-sample framework using test statistic At(4).
5. Cheng-Balakrishnan rank methodology [33] implements the testing criterion At(5) for nonparametric analysis.
6. Modarres' trimmed statistical measure [34] incorporates a proportional trimming factor q within its test statistic .
7. Baklizi's size-adaptive test [35] accounts for both sample dimensionality m and trimming proportion q through statistic .
8. Baklizi's secondary testing framework [35] presents an alternative formulation based on At(8).
9. Baklizi's extended testing protocol [36] features an enhanced version using evaluation metric At(9).
10. Corzo-Babativa nonparametric technique [28] establishes its testing procedure on the foundation of At(10).
11. Noughabi-Jarrahiferiz extropy-based method [23] develops a novel approach using order statistic extropy measure, formalized as At(11).
A symmetric distribution is shown by Case-1 in Table 3, where we can observe that all of the values of δ have powers of the testing statistic that are near 0.05 as expected. The corresponding distribution is asymmetric in the next 8 examples (situations 2 and 3 are almost symmetrical). Test statistics with varying δ values, particularly as they grow, exhibit comparable powers in cases 5, 7, 8, and 9. The weakened power values in instance 4 may be explained by the fact that η1 is much larger than 0, whereas it is nearly 0 in the other examples. We may conclude that our suggested test, based on the information-generating function of order statistics, performs well in the simulation study as the values of δ increase, compared with the other tests in Table 4. Therefore, we anticipate that the suggested test will outperform the competing tests across a wide range of real-world applications.
4.2 Real data set
To demonstrate the applicability of our methodology, we used data from the health statistics bulletin published by the General Authority for Statistics in the Kingdom of Saudi Arabia. This comprehensive dataset captures key health indicators, including:
1. Prevalence of chronic diseases,
2. Mental health status.
The statistical population encompasses all households—both Saudi and non-Saudi—permanently residing in the Kingdom of Saudi Arabia. The survey covers 13 administrative regions and 151 governorates, with 2023 as the base year for calculating indicators. Health status among adults (aged 15 years and above) is assessed using the Visual Analog Scale (VAS), with scores ranging from 0 (worst possible health) to 100 (excellent health). The data is stratified by administrative region and age group, enabling detailed demographic and geographic analysis. The complete dataset is publicly available through the General Authority for Statistics portal at: https://www.stats.gov.sa/statistics-tabs?tab=436312&category=417594. Figure 6 shows visualizations of the data sets histogram and the kernel density estimates, while Figure 7 shows the Q–Q diagram.
4.2.1 Bootstrap procedure
Since the null distribution of is non-pivotal, we employ a reflection bootstrap:
1. Symmetrize the data by generating , where is the sample median.
2. For each b = 1, …, B: (i) Resample uniformly from . (ii) Compute .
3. The p-value is .
Results. The sample exhibits negative skewness (−1.89) and high kurtosis (9.55), indicating:
• A left-skewed distribution with a longer left tail
• Heavy tails and peakedness relative to a normal distribution
The symmetry test results for different sensitivity parameters δ are given in Table 5.
Key findings:
• Strong evidence against symmetry for δ = 2 and 3 (p < 0.05).
• Marginal evidence at δ = 4 (p = 0.051).
• Insufficient evidence to reject symmetry at δ = 5 (p = 0.103).
Interpretation:
• The negative skewness suggests potential outliers in the left tail.
• The decreasing p-values with higher δ indicate the test's reduced sensitivity to asymmetry.
5 Conclusion
This research advances the theoretical understanding of information-generating functions for order statistics through several key contributions. We have systematically investigated monotonicity properties and derived bounds for the proposed measure. The study establishes important stochastic ordering results based on this information-theoretic framework, demonstrating that equality of information-generating function measures for order statistics uniquely determines their parent distributions. Furthermore, we have developed novel characterization theorems for the exponential distribution using this approach. For symmetric distributions, our analysis shows that the information-generating function exhibits extremal behavior (either a local maximum or a minimum) at the median position. This theoretical finding is substantiated through explicit computations for both uniform and standard normal distributions. Building on these theoretical insights, we have formulated a nonparametric symmetry test based on the proposed measure, whose effectiveness increases with δ. The practical utility of our methodology is validated through comprehensive simulation studies and an application to chronic disease management data. Both theoretical and empirical results consistently show that higher values of δ significantly improve the test's performance, confirming the robustness of our approach.
Future studies will include a comprehensive performance comparison with a broader set of established symmetry tests, such as the Baringhaus–Henze, Ahmad–Li, and Bonett–Seier tests, to further situate our method within the broader literature. While this study provides initial validation of the test's power, a more comprehensive investigation against a wider array of alternatives, including heavy-tailed and bounded-support distributions, is a priority for future research.
Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Author contributions
MM: Writing – original draft, Investigation, Software, Formal analysis, Funding acquisition, Visualization, Resources, Supervision, Validation, Project administration, Conceptualization, Writing – review & editing, Data curation, Methodology. MA-L: Writing – review & editing, Methodology, Formal analysis. EA: Writing – review & editing, Methodology, Formal analysis. HS: Funding acquisition, Data curation, Visualization, Resources, Conceptualization, Formal analysis, Validation, Project administration, Methodology, Writing – review & editing, Software, Investigation, Writing – original draft, Supervision.
Funding
The author(s) declared that financial support was received for this work and/or its publication. This work was funded by Prince Sattam bin Abdulaziz University (PSAU/2025/02/35141).
Acknowledgments
The authors extend their appreciation to Prince Sattam bin Abdulaziz University for funding this research work through the project number (PSAU/2025/02/35141).
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
1. Shannon C. A mathematical theory of communication. Bell Syst Tech J. (1948) 27:379–423. doi: 10.1002/j.1538-7305.1948.tb01338.x
2. Golomb S. The information generating function of a probability distribution (corresp.). IEEE Trans Inform Theory. (1966) 12:75–7. doi: 10.1109/TIT.1966.1053843
3. Kharazmi O, Balakrishnan N. Cumulative and relative cumulative residual information generating measures and associated properties. Commun Stat Theory Methods. (2021) 52:5260–73. doi: 10.1080/03610926.2021.2005100
4. Kharazmi O, Balakrishnan N. Cumulative residual and relative cumulative residual Fisher information and their properties. IEEE Trans Inf Theory. (2021) 67:6306–12. doi: 10.1109/TIT.2021.3073789
5. Kharazmi O, Balakrishnan N. Jensen-information generating function and its connections to some well-known information measures. Stat Prob Lett. (2021) 170:108995. doi: 10.1016/j.spl.2020.108995
6. Kharazmi O, Balakrishnan N. Information generating function for order statistics and mixed reliability systems. Commu Stat Theory Methods. (2021) 51:7846–55. doi: 10.1080/03610926.2021.1881123
7. Kharazmi O, Balakrishnan N. Generating function for generalized Fisher information measure and its application to finite mixture models. Hacet J Math Stati. (2022) 51:1472–83. doi: 10.15672/hujms.1094273
8. Zamani Z, Kharazmi O, Balakrishnan N. Information generating function of record values. Math Methods Stat. (2022) 31:120–33. doi: 10.3103/S1066530722030036
9. Kharazmi O, Balakrishnan N, Ozonur D. Jensen-discrete information generating function with an application to image processing. Soft Comput. (2023) 27:4543–52. doi: 10.1007/s00500-023-07863-0
10. Kayal S, Balakrishnan N. Quantile-based information generating functions and their properties and uses. Prob Eng Inf Sci. (2024) 38:1–19. doi: 10.1017/S0269964824000068
11. Onicescu O. The Informational Energy, Component of the Statistical Barometer Concerning the Systems. Bucharest: Technical Publishing House (1966).
12. Bhatia PK. On measures of information energy. Inf Sci. (1997) 97:233–40. doi: 10.1016/0020-0255(94)00071-9
13. Balakrishnan N, Selvitella A. Symmetry of a distribution via symmetry of order statistics. Stat Probabil Lett. (2017) 129:367–72. doi: 10.1016/j.spl.2017.06.023
14. Ahmadi J. Characterization results for symmetric continuous distributions based on the properties of k-records and spacings. Stat Probabil Lett. (2020) 162:108764. doi: 10.1016/j.spl.2020.108764
15. Mahdizadeh M, Zamanzade E. Estimation of a symmetric distribution function in multistage ranked set sampling. Stat Papers. (2020) 61:851–67. doi: 10.1007/s00362-017-0965-x
16. Dai XJ, Niu CZ, Guo X. Testing for central symmetry and inference of the unknown center. Comput Stat Data An. (2018) 127:15–31. doi: 10.1016/j.csda.2018.05.007
17. Bozin V, Milosevic B, Nikitin YY, Obradovic M. New characterization-based symmetry tests. Bull Malays Math Sci Soc. (2020) 43:297–320. doi: 10.1007/s40840-018-0680-3
18. Shaked M, Shanthikumar JG. Stochastic Orders and Their Applications. San Diego, CA: Academic Press (1994).
20. Ebrahimi N, Soofi ES, Zahedi H. Information properties of order statistics and spacings. IEEE Trans Inform Theory. (2004) 50:177–83. doi: 10.1109/TIT.2003.821973
21. Fashandi M, Ahmadi J. Characterizations of symmetric distributions based on Renyi entropy. Stat Probabil Lett. (2012) 82:798–804. doi: 10.1016/j.spl.2012.01.004
22. Xiong PH, Zhuang WW, Qiu GX. Testing symmetry based on the extropy of record values. J Nonparametr Stat. (2021) 33:134–55. doi: 10.1080/10485252.2021.1914338
23. Noughabi HA, Jarrahiferiz J. Extropy of order statistics applied to testing symmetry. Commun Stat-Simul C. (2022) 51:3389–99. doi: 10.1080/03610918.2020.1714660
24. Mohamed MS, Almuqrin MA. Properties of fractional generalized entropy in ordered variables and symmetry testing. AIMS Math. (2025) 10:1116–41. doi: 10.3934/math.2025053
25. Vasicek O. A test for normality based on sample entropy. J R Stat Soc B. (1976) 38:54–9. doi: 10.1111/j.2517-6161.1976.tb01566.x
26. Park S. A goodness-of-fit test for normality based on the sample entropy of order statistics. Stat Probabil Lett. (1999) 44:359–63. doi: 10.1016/S0167-7152(99)00027-9
27. McWilliams TP. A distribution-free test for symmetry based on a runs statistic. J Am Stat Assoc. (1990) 85:1130–3. doi: 10.1080/01621459.1990.10474985
28. Corzo J, Babativa G. A modified runs test for symmetry. J Stat Comput Sim. (2013) 83:984–91. doi: 10.1080/00949655.2011.647026
29. Crzcgorzewski P, Wirczorkowski R. Entropy-based goodness-of-fit test for exponentiality. Commun Stat-Theor M. (1999) 28:1183–202. doi: 10.1080/03610929908832351
30. Baklizi A. A conditional distribution runs test for symmetry. J Nonparametr Stat. (2003) 15:713–8. doi: 10.1080/10485250310001634737
32. Tajuddin IH. Distribution-free test for symmetry based on the Wilcoxon two-sample test. J Appl Stat. (1994) 21:409–15. doi: 10.1080/757584017
33. Cheng WH, Balakrishnan N. A modified sign test for symmetry. Commun Stat-Simul C. (2004) 33:703–9. doi: 10.1081/SAC-200033302
34. Modarres R, Gastwirth JL. A modified runs test for symmetry. Stat Probabil Lett. (1996) 31:107–12. doi: 10.1016/S0167-7152(96)00020-X
35. Baklizi A. Testing symmetry using a trimmed longest run statistic. Aust N Z J Stat. (2007) 49:339–47. doi: 10.1111/j.1467-842X.2007.00485.x
Keywords: information-generating function, non-parametric estimation, order statistics, stochastic order comparison, symmetry testing
Citation: Mohamed MS, Al-Labadi M, Almuhur E and Sakr HH (2026) Further aspects of information-generating function of order statistics with health application in symmetry of chronic disease management. Front. Appl. Math. Stat. 12:1733600. doi: 10.3389/fams.2026.1733600
Received: 27 October 2025; Revised: 24 December 2025; Accepted: 05 January 2026;
Published: 02 February 2026.
Edited by:
Han-Ying Liang, Tongji University, ChinaReviewed by:
Zakariya Yahya Algamal, University of Mosul, IraqShuji Ando, Tokyo University of Science, Japan
Copyright © 2026 Mohamed, Al-Labadi, Almuhur and Sakr. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Hanan H. Sakr, aC5zYWtyQHBzYXUuZWR1LnNh
Mohamed Said Mohamed1