Developing a two-parameter Liu estimator for the COM–Poisson regression model: Application and simulation

The Conway–Maxwell–Poisson (COMP) model is defined as a flexible count regression model used for over- and under-dispersion cases. In regression analysis, when the explanatory variables are highly correlated, this means that there is a multicollinearity problem in the model. This problem increases the standard error of maximum likelihood estimates. To manage the multicollinearity effects in the COMP model, we proposed a new modified Liu estimator based on two shrinkage parameters (k, d). To assess the performance of the proposed estimator, the mean squared error (MSE) criterion is used. The theoretical comparison of the proposed estimator with the ridge, Liu, and modified one-parameter Liu estimators is made. The Monte Carlo simulation and real data application are employed to examine the efficiency of the proposed estimator and to compare it with the ridge, Liu, and modified one-parameter Liu estimators. The results showed the superiority of the proposed estimator as it has the smallest MSE value.


. Introduction
Count data modeling improves' in several areas of research. Count data regression models are used with data that suffer from over-or under-dispersion. Count data regression models include the Poisson model, negative binomial (NB) model, bell model, and Conway-Maxwell-Poisson model. In many areas of research, the commonly used model is the Poisson model. However, the Poisson model assumes that the mean and variance of the response variable are equal. In most cases, the data of the response variable could be over-and under-dispersed. In these cases, the NB regression model is used because it is more flexible than the Poisson regression model in accommodating over-dispersion. However, the Conway-Maxwell-Poisson model is more flexible than the NB model because it can be used in both over-and underdispersion cases (see Cancho et al. [1] and Anan et al. [2]).
The Conway-Maxwell-Poisson (COM-Poisson) distribution was proposed by Conway and Maxwell [3]. This distribution is applicable to real counting data that express over-and underdispersion data, so COM-Poisson regression is a flexible model to correlate between the discrete count response variable and the covariates (explanatory) variables.
The COM-Poisson distribution is flexible enough to handle the dispersion in count data (whether it is over-or under-dispersion) with an additional dispersion parameter (γ ), and it is a two-parameter generalization of the Poisson distribution. The probability mass function (PMF) of the COMP distribution is given as follows: P Y = y; π, γ = π Y f (π , γ ) (Y!) γ ; y = 0, 1, . . . , ∞; γ > 0, (1) where f (π , γ ) = ∞ n=0 π n (n!) γ is the normalizing constant and infinite series and π is the location parameter. The mean and variance of the COMP distribution are given as follows: The COMP is the generalization distribution for some wellknown count distributions: if γ = 1, then the COMP distribution approximates the Poisson distribution, but if γ = 0 and π < 1, then the COMP distribution approximates the geometric distribution, and if γ → ∞, the COMP distribution approximates to the Bernoulli distribution with a probability of π 1+π . Figure 1 presents the probability mass function (PMF) for different simulated data from the COMP distribution. It is noted that the COMP(3,1) is equivalent to the Poisson with a parameter of π = 3, and COMP(0.55,0) is equivalent to the geometric distribution with a parameter of π = 0.55. In the following two cases, COMP (3,1.5) and COMP (3,0.85) refer to the cases of under-and over-dispersion in the COMP distribution, respectively. For more details on the COMP distribution and its properties and applications, see, e.g., Shmueli et al. [4], Nadarajah [5], Borges et al. [6], and Gillispie and Green [7].
If the explanatory variables are highly correlated in the COM-Poisson (COMP) regression model, this means that there is a multicollinearity problem in the model. Thus, in the presence of a multicollinearity problem, the standard error (SE) of the estimates is large, so the maximum likelihood estimator does not give efficient estimates.
In general, to handle the multicollinearity problem, Hoerl and Kennard [8] introduced the ridge regression (RR) estimation class for the linear regression model, it is one of the most popular methods to solve the multicollinearity problem of the linear regression model, and the RR estimator is based on adding a biasing constant k to the OLS estimation. Månsson and Shukur [9] introduced the RR estimator for the Poisson model. The RR estimator is also extended for several count data regression models to overcome the effect of multicollinearity. For example, Månsson [10] proposed the RR estimator for the NB model, Türkan and Özel [11] suggested the modified jackknifed RR estimator for the Poisson model, Kaçiranlar and Dawoud [12] introduced some ridge parameters for the Poisson model, Zaldivar [13] considered the performance of some RR estimators for the Poisson model, Rashad and Algamal [14] developed a new RR estimator, and Yehia [15] suggested the restricted RR estimator for the Poisson model. Algamal et al. [16] introduced the ridge and Liu estimators for the zero-inflated bell regression model. Akram et al. [17] introduced some new ridge parameters for the zero-inflated NB regression model.
Liu [18] suggested a new biased estimator to combat the multicollinearity issue in the linear regression model. Moreover, he showed that the proposed (Liu) estimator is better than the ridge and ordinary least square (OLS) estimators in the presence of multicollinearity. The Liu estimator is also extended for the Poisson and NB models by Månsson et al. [19] [28], and Algamal and Abonazel [29].
The purpose of this article is to provide a new modified twoparameter Liu estimator for the COMP model and propose some methods to choose its parameters. We also compare the proposed estimator to the maximum likelihood, ridge, and Liu estimators.
This article is organized as follows. Section 2 presents the COMP model and the proposed estimator. Section 3 provides theoretical comparisons between the proposed estimator and other estimators. Section 4 provides the suggested biasing parameters for each estimator. Section 5 presents the simulation study. Section 6 applies the proposed estimator to the real data application. Section 7 concludes the study.

. . COM-Poisson maximum likelihood estimator
For the COMP regression model, Guikema and Goffelt [30] suggested a re-parameterization form of the COMP distribution to provide a clear centering parameter as follows: where β is the regression coefficient vector (including the intercept), then the log-likelihood function of the COMP model is given as follows: . /fams. .
To estimate β and γ parameters of this model, we differentiate equation (4) for β and γ as follows [22,23]: We can use the iterative reweighted least square (IRLS) estimation method to solve equations (5) and (6). Therefore, the maximum likelihood (ML) estimator of the β vector is given as follows: [23,31]. The MSE of β ML is given as follows: where tr(·) is the trace of the matrix, = diag λ 1 , λ 2 , . . . , λ p = ψZψ ′ , ψ is the orthogonal matrix whose columns are the eigenvectors of Z, λ j is the jth eigenvalue of Z, and γ is the ML estimate of γ .
. . COM-Poisson-ridge estimator Segerstedt [32] proposed the RR estimator for GLM based on the study of Hoerl and Kennard [8] to handle the multicollinearity issue. When the explanatory variables in the COMP model are highly correlated, then the MSE of ML becomes very large and gives inefficient estimates. To solve the multicollinearity problem in the COMP model, Sami et al. [31] developed the RR estimator for the COMP model and named the CPR estimator: where I p is the identity matrix of the order p. The bias vector, the variance-covariance matrix, and the MSE matrix of the CPR estimator are given as follows: . . , λ p + k . Therefore, the MSE of the CPR estimator, using the tr(·) operator on where α j is the jth element of α = α 1 , α 2 , . . . , α p . . COM-Poisson-Liu estimator Akram et al. [22] and Rasheed et al. [33] provided the Liu estimator for the COMP model and named the CPL estimator, as follows: The bias vector, variance-covariance matrix, and MSE matrix of the CPL estimator are given as follows: where I = ( + I) and d = ( + dI). Therefore, the MSE of the CPL estimator is used as follows: . . COM-Poisson-modified one-parameter Liu estimator Sami et al. [23] proposed a new one-parameter Liu estimator for the COMP model, which is known as the CPMOPL estimator, and it is defined as follows: The bias vector, variance-covariance matrix, and MSE matrix of the CPMOPL estimator are given as follows: where d0 = ( − d 0 I). Therefore, the MSE of the CPMOPL estimator is used as follows: . . Proposed COM-Poisson-new modified two-parameter Liu estimator , we will propose a new modified Liu estimator for the COMP model based on the two parameters k, d 0 . Our proposed estimator is obtained by augmenting − k + d 0 β k = β + ε to the COMP model and then using the CPR estimator. Therefore, the proposed estimator of β, which is called as the CPNMTPL estimator, is given as follows: The bias vector, the variance-covariance matrix, and the MSE matrix of the CPNMTPL estimator are given as follows: Therefore, the MSE of the CPNMTPL estimator is used as follows: We can get the optimal d 0 of β k,d 0 by setting as follows: The difference between the MMSE of the CPML and CPNMTPL estimators is as follows: Therefore, we can rewrite the previous equation as follows: The proof is completed.
. . Comparison among the CPR and CPNMTPL estimators

Theorem 2
The CPNMTPL estimator is better than the CPR estimator if The difference between the MMSE of the CPR and CPNMTPL estimators is as follows: Therefore, we can rewrite the previous equation as follows: The proof is completed.

Theorem 3
The CPNMTPL estimator is better than The difference between the MMSE of the CPL and CPNMTPL estimators is as follows: Therefore, we can rewrite the previous equation as follows: The proof is completed.
The difference between the MMSE of the CPMOPL and CPNMTPL estimators is as follows: Therefore, we can rewrite the previous equation as follows: For the CPMOPL estimator, we suggest using the following estimator for d 0 [23]: For the proposed estimator, we suggest using d 0(opt) in equation (29) as an estimator for d parameter in the CPNMTPL estimator as follows: . If d 0(opt) < 0 or d 0(opt) > 1, we use d 1 instead of d 0(opt) . For k parameter in the CPNMTPL, we suggest using the following four estimators [34, 36, 37]: . Monte Carlo simulation study

. . Simulation design
The Monte Carlo simulation study was used to examine the performance of the proposed estimator and the other estimators under different conditions. We conducted the simulation experiments using some different levels of n p ρ, and γ as follows: Step 1: The correlated explanatory variables (x ij ) are generated as follows [35, 38, 39]: x ij = τ ij 1 − ρ 2 + ρτ ip , i = 1, . . . , n ; j = 1, . . . , p, (39) where τ ij ∼ N(0, 1), and ρ denotes the degree of the correlation between the explanatory variables.
Step 2: The response variable (y i ) follows a COMP (µ i , γ ) distribution, where µ i is generated as follows: Step 3: The output data (x ij , y i ) are repeated L = 1000 times to calculate the simulated MSE criterion as follows: where ( β l − β) is the difference between the estimated and true parameter vectors at the l th replication.
To evaluate the performance of the estimators in different simulated datasets, we repeated the aforementioned three steps at different levels of n, p, ρ, and γ as follows: -Different sample sizes (n) were used: n = 50, 75, 100, 150, and 200.

. . Simulation results
Tables 1-9 describe the simulated MSE for all the combinations of n, p, γ , and ρ. In Tables 1-9, the minimum value of the simulated MSE is highlighted in bold. From the simulation results, we can draw the following conclusions: 1. In terms of MSE, the proposed estimator CPNMTPL has minimum values of MSE, so it outperforms the other estimators, and the CPMOPL estimator ranks second. However, the CPML estimator has the weakest performance, which is influenced by multicollinearity.

. . Relative e ciency
In this sub-section, we use the relative efficiency (RE) measure to study the efficiency of the biased estimators. The RE is derived using the MSE in equation (41) where β * represents β k , β d , β d 0 , or β k,d 0 . Moreover, the root mean squared error (RMSE) is used as a standard statistical tool for assessing the performance of the estimators. It is calculated using the estimators' MSE as follows: Figures 2-5 present RMSE of the CPML, CPR, CPL, CPMOPL, and CPNMTPL estimators and RE of the CPR, CPL, CPMOPL, and CPNMTPL estimators for different levels of the sample sizes (n), correlation degrees between explanatory variables (ρ), the dispersion parameter (γ ), and the number of explanatory variables (p), respectively. Figures 2-5 show that the CPML estimator has the largest value, while the proposed estimator CPNMTPL with k 2 , k 3 , and k 4 has the smallest value of RMSE. For relative efficiency, the CPNMTPL estimator with k 2 , k 3 , and k 4 has higher RE values than the other estimators for different levels of n, p, γ , and ρ.

. Application
In this section, we use a real dataset from Cameron and Trivedi [43] to estimate a recreation demand function. This dataset was obtained from a survey on the number of recreational boating trips to Somerville Lake, East Texas, in 1980. The response (dependent) variable of this data is the number of recreational boating trips to Somerville Lake in 1980. While there are three explanatory variables as follows: X1: Lake Conroe visit fee, X2: Somerville Lake visit fee, and X3: Houston Lake visit fee. These data are also used by Abonazel and Dawoud [44], the sample size is 179 observations (after removing the outlier values). To check the existence of the multicollinearity problem, correlation coefficients between explanatory variables, variance inflation factor (VIF), and condition number (CN) can be used. All correlation coefficients are greater than 0.90: ρ X1,X2 = 0.97, ρ X1,X3 = 0.98, ρ X2,X3 = 0.94. While the values of the VIF are 157.75, 52.34, and 90.10, and the value of the where |·| is the matrix determinant, R 1 is the correlation matrix of a specific set of explanatory variables, R 2 is the correlation matrix of a particular set of explanatory variables, and R X is the correlation matrix of all the explanatory variables in the model [46]: is the estimated variance-covariance matrix of β ML and diag(V) is the matrix of the diagonal elements of V. While the CN value is calculated based on the following formula [47]: where λ max and λ min are the largest and smallest eigenvalues of the X ′ WX matrix.
First, we have fitted various count data models that are the Poisson, the negative binomial, and the COMP distributions. The Akaike information criterion (AIC) is used to select the best model. AIC values of these models are found to be the Poisson (60.50), the negative binomial (50.61), and the COMP (45.09). We observed that the COMP has a minimum AIC value. This shows that the COMP model is well fitted to this data. Table 10 presents the estimates of regression coefficients, estimated MSE, and R-squared (R 2 ) for the different estimators. The estimated MSE of the CPML, CPR, CPL, CPMOPL, and CPNMTPL estimators was obtained by equations (8), (13), (18), (23), and (28), respectively, based on the α = ψ ′ β ML and the estimated values of the biasing parameters ( k, d 1 , d 2 , d 0 , k 1 , k 2 , k 3 , and k 4 ). The R-squared (R 2 ) is calculated based on the following formula [48]: where y = n −1 n i=1 y i , µ i = exp x ′ i β ; β represents β ML , β k , β d , β d 0 , and β k,d 0 to obtain R 2 for the CPML, CPR, CPL, CPMOPL, and CPNMTPL estimators, respectively.
From Table 10, we note the following: 1. It is noted that the estimated coefficients for the biased estimators [CPR, CPL, CPMOPL, and CPNMTPL( k 3 )] have the same sign; this suggests that the correlation between each explanatory variable and the response variable remains unchanged from the CPML estimator. 2. The MSE values of the CPR, CPL, CPMOPL, and CPNMTPL estimators are lower than the CPML. However, the CPNMTPL estimator based on the k 3 has the lowest value of MSE. 3. Furthermore, in terms of prediction, the R 2 value of the proposed CPNMTPL estimator is the greatest among all the used estimators.

. Conclusion
In this article, we proposed a new modified two-parameter Liu estimator (CPNMTPL estimator) for the COMP model to deal with the multicollinearity issue. We proved that the proposed CPNMTPL estimator is more efficient than the previously biased estimators (CPR, CPL, and CPMOPL estimators) proposed in the literature. The simulation study and the real data application were conducted to examine the performance of the CPNMTPL estimator and compared it with the CPR, CPL, and CPMOPL estimators. The results of the simulation study and empirical application indicated that the CPNMTPL estimator outperforms these estimators, in terms of the mean squared error (MSE) reduction. Under three values of dispersion, the CPNMTPL estimator, with all biasing parameters, performs better than the CPR, CPL, and CPMOPL estimators when the COMP model contains the multicollinearity issue. In the future study, we will use the generalized cross-validation (GCV) criterion to select the biasing parameters of the proposed estimator, as an extension of Roozbeh [49], to achieve more efficiency.

Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.