ORIGINAL RESEARCH article

Front. Appl. Math. Stat., 20 April 2022

Sec. Statistics and Probability

Volume 8 - 2022 | https://doi.org/10.3389/fams.2022.880086

Generalized Kibria-Lukman Estimator: Method, Simulation, and Application

  • 1. Department of Mathematics, Al-Aqsa University, Gaza, Palestine

  • 2. Department of Applied Statistics and Econometrics, Faculty of Graduate Studies for Statistical Research, Cairo University, Giza, Egypt

  • 3. Department of Quantitative Analysis, College of Business Administration, King Saud University, Riyadh, Saudi Arabia

Article metrics

View details

16

Citations

5,7k

Views

1k

Downloads

Abstract

In the linear regression model, the multicollinearity effects on the ordinary least squares (OLS) estimator performance make it inefficient. To solve this, several estimators are given. The Kibria-Lukman (KL) estimator is a recent estimator that has been proposed to solve the multicollinearity problem. In this paper, a generalized version of the KL estimator is proposed, along with the optimal biasing parameter of our proposed estimator derived by minimizing the scalar mean squared error. Theoretically, the performance of the proposed estimator is compared with the OLS, the generalized ridge, the generalized Liu, and the KL estimators by the matrix mean squared error. Furthermore, a simulation study and the numerical example were performed for comparing the performance of the proposed estimator with the OLS and the KL estimators. The results indicate that the proposed estimator is better than other estimators, especially in cases where the standard deviation of the errors was large and when the correlation between the explanatory variables is very high.

Introduction

The statistical consequences of multicollinearity are well-known in statistics for a linear regression model. Multicollinearity is known as the approximately linear dependency among the columns of the matrix X in the following linear model

where y is an n × 1 vector of the given dependent variable, X is a known n × p matrix of the given explanatory variables, β is an p × 1 vector of given unknown regression parameters, and ε is described as an n × 1 vector of the disturbances. Then, the ordinary least squares (OLS) estimator of β for the model (1) is given as:

The multicollinearity problem effects on the behavior of the OLS estimator make it inefficient. Sometimes, it produces wrong signs [1, 2]. Many studies were conducted to handle this. For example, Hoerl and Kennard [2] proposed the ordinary ridge and the generalized ridge (GR) estimators, while Liu [3] introduced the popular Liu and the generalized Liu (GL), and very recently, Kibria and Lukman [1] proposed a ridge-type estimator called the Kibria–Lukman (KL) estimator which is defined by

This estimator has been extended for use in different generalized linear models, such as Lukman et al. [4, 5], Akram et al. [6], and Abonazel et al. [7].

According to recent papers [810], we can say that the efficiency of any bias estimator will increase if the estimator is modified or generalized using bias parameters that vary from observation to observation in the sample (ki and/or di) rather than in fixed bias parameters (k and/or d). Hence, the main purpose of this paper is to develop a general form of the KL estimator to combat the multicollinearity in the linear regression model.

The rest of the discussion in this paper is structured as follows: Section Statistical Methodology presents the statistical methodology. In Section Superiority of the Proposed GKL Estimator, we theoretically compare the proposed general form of the KL estimator with each of the mentioned estimators. In Section The Biasing Parameter Estimator of the GKL Estimator, we give the estimation of the biasing parameter of the proposed estimator. Different scenarios of the Monte Carlo simulation are done in Section A Monte Carlo Simulation Study. A real data is used in Section Empirical Application. Finally, Section Conclusion presents some conclusions.

Statistical Methodology

Canonical Form

The canonical form of the model in equation (1) is used as follows:

where Z = XR, α = R′β, and R is an orthogonal matrix such that Then, the OLS of α is as:

and the matrix mean squared error (MMSE) is given as,

Ridge Regression Estimators

The OR and the GR of αare, respectively, defined as follows [2]:

where , k > 0 and , with K = diag(k1, k2, ..., kp), ki > 0, and i = 1, 2, ..., p.

The MMSE of the OR and the GR are given respectively as:

Liu Regression Estimators

The Liu and the GL of αare respectively defined as follows [3]:

where

The MMSE of the Liu and the GL are, respectively, given as:

Kibria–Lukman Estimator

The KL estimator of α is given as Kibria and Lukman [1]:

where M1 = [GkIp] and the MMSE of this estimator is given as:

The Proposed GKL Estimator

Now, by replacing W1 with W2 and M1 with M2 = [GK] in the KL estimator, we obtain the general form of the GKL estimator as follows:

then, the MMSE of the proposed GKL estimator is computed by,

Superiority of the Proposed GKL Estimator

In this section, we make a comparison of the proposed GKL estimator with each of OLS, GR, GL, and KL estimators. First, we offer some useful lemmas for our comparisons of estimators.

Lemma 1: Wang et al. [11]: Suppose M and N are n × n positive definite matrices, then M > N if and only if (iff) λ − 1max, where λ − 1max is the maximum eigenvalue of NM−1 matrix.

Lemma 2: Farebrother [12]: Let S be an n × n positive definite matrix. That is, S > 0 and α be some vector. Then, S − αα′ > 0 iff α′S−1α < 1.

Lemma 3: Trenkler and Toutenburg [13]: Let αi = Uiw, i = 1, 2 be any two linear estimators of α. Suppose that , where be the covariance matrix of and . Then,

iff where

Theorem 1: is superior to iff

Proof: The covariance matrices difference is written as

where becomes positive definite iff or (gi + ki) − (giki) > 0. It is clear that for ki > 0, i = 1, 2, ..., p, (gi + ki) − (giki) = 2ki > 0. Therefore, this is done using Lemma 3.

Theorem 2: When λ − 1max, is superior to iff

where , , and .

Proof:

For ki > 0, it is obvious that M > 0 and N > 0. Then, MN > 0 iff λ − 1max, where λ − 1max is the maximum eigenvalue of NM−1. So, this is done by Lemma 1.

Theorem 3: is superior to iff

where .

Proof: The covariance matrices difference is written as

where becomes positive definite iff or (gi + ki)(gi + di) − (giki)(gi + 1) > 0. So, if ki > 0 and 0 < di < 1, (gi + ki)(gi + di) − (giki)(gi + 1) = ki(2gi + di + 1) + gi(di − 1) > 0. So, this is done by Lemma 3.

Theorem 4: is superior to iff

where .

Proof: The covariance matrices difference is written as

where becomes positive definite iff or (gi + ki)(gik) − (giki)(gi + k) > 0. So, if ki > 0 and ki > k, (gi + ki)(gik) − (giki)(gi + k) = 2gi(kik) > 0. So, this is done by Lemma 3.

The Biasing Parameter Estimator of the GKL Estimator

The performance of any estimator depends on its biasing parameter. Therefore, the determination of the biasing parameter of an estimator is an important issue. Different studies analyzed this issue (e.g., [2, 3, 810, 1424]).

Kibria and Lukman [1] proposed the biasing parameter estimator of the KL estimator as follows:

Here, we find the estimation of the optimal values of ki for the proposed GKL estimator. The optimal values of ki are obtained by minimizing

Differentiating m(k1, k2, ..., kp) with respect to ki and setting [, the optimal values of ki after replacing σ2 and by their unbiased estimators become as follows:

A Monte Carlo Simulation Study

The explanatory variables are generated as follows [2527]:

where aji are the independent pseudo-random numbers that have the standard normal distribution and ρ is known that the correlation between two given explanatory variables. The dependent variable y are given by:

where εj are the i.i.dN(0, σ2). The values of β are given such that β′β = 1 as discussed in Dawoud and Abonazel [28], Algamal and Abonazel [29], Abonazel et al. [7, 30], and Awwad et al. [31]. Also, all factors that used in the simulation are given in Table 1.

Table 1

FactorSymbolLevels
Sample sizen50, 100, 150
Standard deviationσ1, 5, 10
Degree of correlationρ0.8, 0.9, 0.99
Explanatory variables numberp3, 7
Replicates numberMCN5,000

The factors' values of the simulation study.

In order to see the performance of the OLS, KL, and the proposed GKL estimators with their biasing parameters estimators presented in Section Statistical Methodology, the estimated mean squared error (EMSE) are calculated for each replicate with different values of σ, ρ, n, and p using the following formula:

where is the estimated vector of α at the lth experiment of the simulation.

The EMSE values of the OLS, KL, and GKL estimators are presented in Tables 2, 3. We can conclude the following based on the simulation results:

  • When the standard deviation (σ), the degree of multicollinearity (ρ), and the explanatory variables number (p) get an increase, the EMSE values of estimators get an increase.

  • The EMSE values of estimators get a decrease in case of the sample size gets an increase.

  • The GKL is better than the OLS estimator in all different values of factors except when σ = 1 and ρ = 0.80, 0.90 with the considered values of p and n.

  • The GKL is better than the KL estimator in all different values of factors except the following cases: (i) for n = 50 when σ = 1 and ρ = 0.80, 0.90 with p = 3 or 7, (ii) for n = 100, 150 when σ = 1 in all presented values of ρ with p = 3 or when σ = 5 and ρ = 0.80 with p = 3, and (iii) for n = 100, 150 when σ = 1 and ρ = 0.80, 0.90 with p = 7.

  • Finally, we see that the proposed GKL estimator is obviously efficient in case of the standard deviation getting large and when the correlation among the explanatory variables are very high.

Table 2

nσρOLSKLGKL
5010.80.12490.10940.1548
0.90.22600.18290.2738
0.992.06411.14391.1208
50.83.12351.75501.6052
0.95.64912.86002.4774
0.9951.603622.237817.6275
100.812.49406.28985.3865
0.922.596510.57758.7621
0.99206.414487.885069.2762
10010.80.06050.05570.0701
0.90.11070.09640.1373
0.991.03080.64540.7558
50.81.51180.93060.9509
0.92.76631.50971.4056
0.9925.769711.37368.9376
100.86.04713.14362.7244
0.911.06515.29524.3648
0.99103.078844.495834.4270
15010.80.04200.03930.0469
0.90.07680.06870.0928
0.990.71250.47000.6113
50.81.04970.67630.7487
0.91.91891.08931.0826
0.9917.81247.76316.1352
100.84.19882.22141.9830
0.97.67563.69053.1029
0.9971.249629.982723.1604

Estimated mean squared error (EMSE) values of the estimators when p = 3.

For each case, the smallest EMSE value is bolded.

Table 3

nσρOLSKLGKL
5010.80.41430.31290.4302
0.90.67920.53990.6831
0.997.38673.99413.0983
50.810.35685.51394.1882
0.919.479610.08497.4658
0.99184.667392.817566.6994
100.841.427221.183915.5082
0.977.918639.412428.5547
0.99738.6690370.3048265.3667
10010.80.17660.15290.2137
0.90.33220.27020.3652
0.993.15611.98881.7020
50.84.41592.72752.2455
0.98.30604.89113.8358
0.9978.901943.609132.3890
100.817.663810.15447.7808
0.933.224018.674714.0582
0.99315.6077173.4003128.2151
15010.80.11050.09920.1341
0.90.20810.17730.2504
0.991.97691.31081.2036
50.82.76321.78041.5371
0.95.20143.15882.5389
0.9949.422427.376920.2601
100.811.05296.45424.9732
0.920.805411.80068.8790
0.99197.6896108.30679.6545

EMSE values of the estimators when p = 7.

For each case, the smallest EMSE value is bolded.

Empirical Application

For clarifying the performance of the proposed GKL estimator, the dataset of the Portland cement that was originally due to Woods et al. [32], which was considered in Kibria and Lukman [1], where the dependent variable is the heat evolved after 180 days of curing and measured in calories per gram of cement. In this study, the first explanatory variable is tricalcium aluminate, the second explanatory variable is tricalcium silicate, the third explanatory variable is tetracalcium aluminoferrite, and the fourth explanatory variable is β-dicalcium silicate. The eigenvalues of XX matrix are 44,676.21, 5,965.42, 809.95, and 105.42. Then, the condition number is 20.58. Therefore, multicollinearity exists among the predictors. The estimated error variance is , which shows high noise in the data. The estimated values of the optimal parameters in the GKL estimator are calculated as derived in Section Statistical Methodology. Also, the equation proposed by Kibria and Lukman [1] for estimating the biasing parameter of the KL estimator is used. Consequently, the mean square error (MSE) of the OLS, KL, and GKL estimators are presented in Table 4. From Table 4, we can note that the KL estimator is better than the OLS estimator, and the GKL estimator is better than the OLS and KL estimators.

Table 4

EstimatorMSE
OLS2.19301.15330.75850.48630.0638
KL2.17641.15720.74650.48880.0629
GKL2.16531.16130.73120.49040.0620

Estimated coefficients and mean squared error (MSE) values of the estimators.

Conclusion

In this paper, we proposed the GKL estimator. The performance of the proposed GKL estimator is theoretically compared with the OLS, GR, GL, and KL estimators in terms of known matrix mean squared error. Moreover, the optimal shrinkage parameter of the proposed GKL estimator is presented. A simulation study and the numerical example were performed for comparing the performance of the proposed GKL estimator with the OLS and KL estimators based on the estimated mean squared error criterion. The results indicate that the proposed estimator is better than other estimators, in particular, in the case the standard deviation of the errors was large and when the correlation between the explanatory variables is very high.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Statements

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.

Author contributions

ID, MA, and FA contributed to conception and structural design of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.

Acknowledgments

The authors would like to thank the Deanship of Scientific Research at King Saud University, represented by the Research Center, at CBA for supporting this research financially.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  • 1.

    KibriaBMGLukmanAF. A new ridge-type estimator for the linear regression model: simulations and applications. Hindawi Sci. (2020) 2020:9758378. 10.1155/2020/9758378

  • 2.

    HoerlAEKennardRW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. (1970) 12:5567.

  • 3.

    LiuK. A new class of biased estimate in linear regression. Commun Stat Theory Methods. (1993) 22:393402.

  • 4.

    LukmanAFAlgamalZYKibriaBGAyindeK. The KL estimator for the inverse Gaussian regression model. Concurr Comput Prac Exp. (2021) 33:e6222. 10.1002/cpe.6222

  • 5.

    LukmanAFDawoudIKibriaBMAlgamalZYAladeitanB. A new ridge-type estimator for the gamma regression model. Scientifica. (2021) 2021:5545356. 10.1155/2021/5545356

  • 6.

    AkramMNKibriaBGAbonazelMRAfzalN. On the performance of some biased estimators in the gamma regression model: simulation and applications. J Stat Comput Simul. (2022) 123. 10.1080/00949655.2022.2032059

  • 7.

    AbonazelMRDawoudIAwwadFALukmanAF. Dawoud–Kibria estimator for beta regression model: simulation and application. Front Appl Math Stat. (2022) 8:775068. 10.3389/fams.2022.775068

  • 8.

    RashadNKHammoodNMAlgamalZY. Generalized ridge estimator in negative binomial regression model. J Phys. (2021) 1897:012019. 10.1088/1742-6596/1897/1

  • 9.

    FarghaliRAQasimMKibriaBMAbonazelMR. Generalized two-parameter estimators in the multinomial logit regression model: methods, simulation and application. Commun Stat Simul Comput. (2021) 116. 10.1080/03610918.2021.1934023

  • 10.

    AbdulazeezQAAlgamalZY. Generalized ridge estimator shrinkage estimation based on particle swarm optimization algorithm. Electro J Appl Stat Anal. (2021) 14:25465. 10.1285/I20705948V14N1P254

  • 11.

    WangSGWuMXJiaZZ. Matrix Inequalities. Beijing: Chinese Science Press (2006).

  • 12.

    FarebrotherRW. Further results on the mean square error of ridge regression. J R Stat Soc Ser B. (1976) 38:24850.

  • 13.

    TrenklerGToutenburgH. Mean squared error matrix comparisons between biased estimators-an overview of recent results. Stat Pap. (1990) 31:16579.

  • 14.

    HoerlAEKannardRWBaldwinKF. Ridge regression: some simulations. Commun. Stat. (1975) 4:10523.

  • 15.

    KhalafGShukurG. Choosing ridge parameter for regression problems. Commun Stat Theory Methods. (2005) 34:117782. 10.1081/STA-200056836

  • 16.

    KhalafGMånssonKShukurG. Modified ridge regression estimators. Commun Stat Theory Methods. (2013) 42:147687. 10.1080/03610926.2011.593285

  • 17.

    MånssonKKibriaBMGShukurG. Performance of some weighted Liu estimators for logit regression model: an application to Swedish accident data. Commun Stat Theory Methods. (2015) 44:36375. 10.1080/03610926.2012.745562

  • 18.

    KibriaBMGBanikS. Some ridge regression estimators and their performances. J Mod Appl Stat Methods. (2016) 15:20638. 10.22237/jmasm/1462075860

  • 19.

    AlgamalZY. A new method for choosing the biasing parameter in ridge estimator for generalized linear model. Chemometr Intell Lab Syst. (2018) 183:96101. 10.1016/j.chemolab.2018.10.014

  • 20.

    AbonazelMRFarghaliRA. Liu-type multinomial logistic estimator. Sankhya B. (2019) 81:20325. 10.1007/s13571-018-0171-4

  • 21.

    QasimMAminMOmerT. Performance of some new Liu parameters for the linear regression model. Commun Stat Theory Methods. (2020) 49:417896. 10.1080/03610926.2019.1595654

  • 22.

    SuhailMChandSKibriaBG. Quantile based estimation of biasing parameters in ridge regression model. Commun Stat Simul Comput. (2020) 49:273244. 10.1080/03610918.2018.1530782

  • 23.

    BabarIAyedHChandSSuhailMKhanYAMarzoukiR. Modified Liu estimators in the linear regression model: an application to tobacco data. PLoS ONE. (2021) 16:e0259991. 10.1371/journal.pone.0259991

  • 24.

    AbonazelMRTahaIM. Beta ridge regression estimators: simulation and application. Commun Stat Simul Comput. (2021) 113. 10.1080/03610918.2021.1960373

  • 25.

    McDonaldGCGalarneauDI. A Monte Carlo evaluation of some ridge-type estimators. J Am Stat Assoc. (1975) 70:40716. 10.2307/2285832

  • 26.

    GibbonsDG. A simulation study of some ridge estimators. J Am Stat Assoc. (1981) 76:1319.

  • 27.

    KibriaBMG. Performance of some new ridge regression estimators. Commun Stat Simul Comput. (2003) 32:41935. 10.1081/SAC-120017499

  • 28.

    DawoudIAbonazelMR. Robust Dawoud–Kibria estimator for handling multicollinearity and outliers in the linear regression model. J Stat Comput Simul. (2021) 91:367892. 10.1080/00949655.2021.1945063

  • 29.

    AlgamalZYAbonazelMR. Developing a Liu-type estimator in beta regression model. Concurr Comput Pract Exp. (2022) 34:e6685. 10.1002/cpe.6685

  • 30.

    AbonazelMRAlgamalZYAwwadFATahaIM. A new two-parameter estimator for beta regression model: method, simulation, and application. Front Appl Math Stat. (2022) 7:780322. 10.3389/fams.2021.780322

  • 31.

    AwwadFADawoudIAbonazelMR. Development of robust Özkale–Kaçiranlar and Yang–Chang estimators for regression models in the presence of multicollinearity and outliers. Concurr Comput Pract Exp. (2022) 34:e6779. 10.1002/cpe.6779

  • 32.

    WoodsHSteinourHHStarkeHR. Effect of composition of Portland cement on heat evolved during hardening. Indust Eng Chem. (1932) 24:120714. 10.1021/ie50275a002

Summary

Keywords

generalized liu estimator, multicollinearity, generalized ridge estimator, biasing parameter, ridge-type estimator

Citation

Dawoud I, Abonazel MR and Awwad FA (2022) Generalized Kibria-Lukman Estimator: Method, Simulation, and Application. Front. Appl. Math. Stat. 8:880086. doi: 10.3389/fams.2022.880086

Received

21 February 2022

Accepted

14 March 2022

Published

20 April 2022

Volume

8 - 2022

Edited by

Min Wang, University of Texas at San Antonio, United States

Reviewed by

Muhammad Suhail, University of Agriculture, Peshawar, Pakistan; Zakariya Yahya Algamal, University of Mosul, Iraq

Updates

Copyright

*Correspondence: Mohamed R. Abonazel

This article was submitted to Statistics and Probability, a section of the journal Frontiers in Applied Mathematics and Statistics

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics