Bayesian regression modeling and inference of energy efficiency data: the effect of collinearity and sensitivity analysis

Al-Essa, Laila A.; Ebrahim, Endris Assen; Mergiaw, Yusuf Ali

doi:10.3389/fenrg.2024.1416126

ORIGINAL RESEARCH article

Front. Energy Res., 30 July 2024

Sec. Energy Efficiency

Volume 12 - 2024 | https://doi.org/10.3389/fenrg.2024.1416126

Bayesian regression modeling and inference of energy efficiency data: the effect of collinearity and sensitivity analysis

Laila A. Al-Essa¹

Endris Assen Ebrahim²*^†

Yusuf Ali Mergiaw³

¹Department of Mathematical Sciences, College of Science, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
²Department of Statistics, College of Natural and Computational Sciences, Debre Tabor University, Debre Tabor, Ethiopia
³Department of Mechanical Engineering, Gafat Institute of Technology, Debre Tabor University, Debre Tabor, Ethiopia

The majority of research predicted heating demand using linear regression models, but they did not give current building features enough context. Model problems such as Multicollinearity need to be checked and appropriate features must be chosen based on their significance to produce accurate load predictions and inferences. Numerous building energy efficiency features correlate with each other and with heating load in the energy efficiency dataset. The standard Ordinary Least Square regression has a problem when the dataset shows Multicollinearity. Bayesian supervised machine learning is a popular method for parameter estimation and inference when frequentist statistical assumptions fail. The prediction of the heating load as the energy efficiency output with Bayesian inference in multiple regression with a collinearity problem needs careful data analysis. The parameter estimates and hypothesis tests were significantly impacted by the Multicollinearity problem that occurred among the features in the building energy efficiency dataset. This study demonstrated several shrinkage and informative priors on likelihood in the Bayesian framework as alternative solutions or remedies to reduce the collinearity problem in multiple regression analysis. This manuscript tried to model the standard Ordinary Least Square regression and four distinct Bayesian regression models with several prior distributions using the Hamiltonian Monte Carlo algorithm in Bayesian Regression Modeling using Stan and the package used to fit linear models. Several model comparison and assessment methods were used to select the best-fit regression model for the dataset. The Bayesian regression model with weakly informative prior is the best-fitted model compared to the standard Ordinary Least Squares regression and other Bayesian regression models with shrinkage priors for collinear energy efficiency data. The numerical findings of collinearity were checked using variance inflation factor, estimates of regression coefficient and standard errors, and sensitivity of priors and likelihoods. It is suggested that applied research in science, engineering, agriculture, health, and other disciplines needs to check the Multicollinearity effect for regression modeling for better estimation and inference.

1 Introduction

Several studies used regression models to predict the electric energy consumption and efficiency of office or residence buildings without checking Multicollinearity effects in a frequentist statistical approach (Baranova et al., 2017; Reim et al., 2017; Taskin et al., 2022; Neubauer et al., 2024). To make sure that investments in energy conservation measures (ECMs) and the development of new energy-efficient buildings provide the anticipated and promised performance, reliable estimating techniques are required to assess the effects of various features. The standard linear regression approach has limitations in estimating and inferring energy efficiency data having collinear features (Moletsane et al., 2018; Mummolo and Peterson, 2018; Tahmasebinia et al., 2023; Ahmadi, 2024; Kaczmarczyk, 2024).

The conventional method for estimating energy efficiency involves using a linear regression model. However, only partially addressed the statistical issues described for the linear regression approach and the potential Multicollinearity issue due to the high correlation between building energy efficiency (Moletsane et al., 2018; Tahmasebinia et al., 2023; Ahmadi, 2024; Kaczmarczyk, 2024). Bayesian inference has various applications in science, engineering, and social sciences. Model parameters are assumed to be constant in traditional frequentist inference (Nithin, 2023). Using existing data and prior knowledge of population parameters, Bayesian statistics is a statistical tool that may generate estimates via the posterior distribution. For both experimental and applied studies, one of the most widely used statistical techniques is Bayesian Multiple Regression (BMR) analysis. Nevertheless, associated predictor variables and their collinearity effects are frequently a source of worry in the statistical inference of regression estimates (Farrar and Glauber, 1967). Strong correlation among independent variables in multiple linear regression models leads to high Standard Errors (SE) of the regression coefficients, known as the Multicollinearity problem (Willis and Perlack, 1978). Bayesian inference in multiple linear regression analysis considered estimators for testing simple hypotheses concerning the regression coefficients (Wu et al., 2023). Bayesian interval estimation (credible intervals) can be formulated using prior information of various kinds incorporated in the analysis (Assaf and Tsionas, 2021b). Due to efficiency in computing, accuracy in the estimate, and variable selection, Bayesian shrinkage and non-informative priors have attracted much interest recently.

Many characteristics of building energy efficiency are correlated with the heating load as well as with each other (Jammulamadaka et al., 2022). Potential Multicollinearity issue due to the high correlation between energy efficiency features provides biased estimates and untruthful inferences. To achieve optimal energy efficiency feature selection is required with appropriate methods of analysis. To reduce the detrimental effects of Multicollinearity on the estimations of energy efficiency, biased regression procedures have been developed.

The main consequence of Multicollinearity in statistical estimation and inference is to inflate the SE of some or all regression coefficients of the fitted model (Kim, 2019); which leads to failure to reject the null hypothesis on the significance of the regression coefficient and wider confidence interval. The type II error rate (lowering power) of the parameter hypothesis tests increased due to exaggerated SE and confidence intervals of the estimated model parameters. Multicollinearity has statistical repercussions, such as exaggerated standard errors that make it challenging to assess individual regression coefficients in hypothesis testing (Assaf and Tsionas, 2021a).

The other consequence is that the posterior distribution would seem to recommend that none of the variables is reliably related to the outcome variable, even if all predictor variables are strongly related to the outcome. In contrast to statistical inference on the regression coefficients, Multicollinearity does not impact the model’s overall fit to the observed response variable data and prediction (Alin, 2010). The issues of autocorrelation, Multicollinearity, and heteroscedasticity plague the majority of econometric models. The assumptions of the standard regression model are not always met in real-world situations (Youssef, 2022).

The effects of Multicollinearity can be either numerical or statistical in such a way that the statistical consequences of Multicollinearity include difficulties in testing the individual parameters of regression coefficients due to inflated standard errors. Due to large standard errors, a large confidence region may arise. If the researcher(s) need to explain the effect of individual regression coefficients on Y, the statistical consequence of Multicollinearity will cause trouble, because this effect cannot be separated. Therefore, we may be unable to declare the significance of the predictor (X) even though it has a strong relationship with the targeted outcome (dependent) variable (Y). Moreover, the Ordinary Least Squares Estimates (OLSE) may be sensitive to small changes in the values of explanatory variables. On the other hand, numerical consequences of Multicollinearity include difficulties in computer calculations due to numerical instability. In extreme cases, the computer may try to divide by zero and thus fail to complete the analysis. Or even worse, the computer may complete the analysis but then report meaningless, widely incorrect numbers.

Multicollinearity can be identified using a correlation matrix or Variance Inflation Factor (VIF) of features that can predict the outcome variable with a high R-squared value demonstrating a strong linear relationship (Alin, 2010). Regression coefficients in multiple regression models with a VIF of more than 10 are not robustly computed when Multicollinearity is present (Shrestha, 2020).

A study has used the prediction of heating and cooling loads using partial least squares towards efficient residential building design without checking assumptions of the classical approaches (Kavaklioglu, 2018). Many researchers and statisticians are reluctant to apply Bayesian statistical approaches since they find it difficult to draw conclusions based on their prior opinions (Sinay and Hsu, 2014). Bayesian inference of the posterior is strongly influenced by the prior information (Oluwadare, 2021). Utilizing prior knowledge in addition to sample data is one of the primary benefits of Bayesian techniques. Adding more prior information can be an alternate strategy to lessen the uncertainty caused by collinearity. Among the several methods to address the problem of Multicollinearity were the use of shrinkage priors and associated algorithms such as a ridge, LASSO (Least Absolute Shrinkage and Selection Operator), or elastic net regression. Imposing shrinkage priors mitigates the collinearity problem by sifting the likelihood surface to create a posterior distribution, which divides up the pertinent likelihood data among the subset modes (Mahajan et al., 1977; Garg, 1984; Ročková and George, 2014; Zhang et al., 2022).

Energy efficiency and power-related datasets are the most correlated attributes to understand what to exclude from regression to avoid Multicollinearity problems. To design buildings that follow certain standards architects and engineers need to identify which parameters will significantly influence future energy demand (Sekhar Roy et al., 2018).

Multicollinearity decreases the statistical power of the regression model by reducing the precision of the calculated regression coefficients and making the model extremely sensitive to even tiny changes in the observed values and model (Ročková and George, 2014). A sensitivity analysis evaluates the analysis carried out using Ordinary Least Squares (OLS) regression and Bayesian regression analysis in which Bayesian shrinkage priors were changed in the regression model (Piironen and Vehtari, 2017b; Ackermann, 2019; Kim et al., 2019). Sensitivity analysis is used to evaluate the estimation and influence of regression results on changes in different modeling approaches (Seltzer, 1993). Having strongly correlated predictors can increase the uncertainty of the posterior distributions of the regression coefficients (Van de Schoot and Depaoli, 2014). On the contrary, using the earlier distribution in Bayesian analyses can particularly come to the rescue because it makes it much less possible for the posteriors to have an extraordinarily huge posterior mean and standard deviation. To estimate the Heating Load (HL) and Cooling Load (CL) of the energy-effectual housing structures, Bui et al. (2019), employed Artificial Neural Networks (ANNs). To achieve this, a suitable data set was supplied that comprises the heating load and the cooling load with the relevant factors, relative compactness, surface area, wall area, roof area, overall height, orientation, glazing area, and distribution of the glazing area.

In practical Bayesian statistics, multiple regression, Bayesian networks, and artificial neural networks were used for prediction (Felipe et al., 2015). An artificial neural network with Bayesian regularization modeling was used to assess the performance of electronic components over their lifetimes in four different scenarios. The findings showed that there was a direct relationship between the reliability parameters examined in all scenarios and an increase in the Mean Time Between Failures value appeared for each scenario (Çolak et al., 2023); and ANNs with Bayesian regularization are an effective and potent mathematical technique for evaluating a lifetime model’s dependability (Sindhu et al., 2023).

Several studies demonstrated the superiority of the Bayesian approach over the frequentist approach of the multiple linear regression model in identifying the predictors for the outcome variable (Zianis et al., 2016; Gebrie, 2021; Tanoe et al., 2021; Vijayaragunathan et al., 2023). But what distinguishes this study from others is the way it takes into account the Multicollinearity effect and applies multiple prior distributions or beliefs to evaluate sensitivity in Bayesian regularization of regression parameters.

According to Pesaran and Smith (2019), in scenarios of exact and highly collinear predictors, the asymptotic behavior of the posterior estimate, and the accuracy of the parameters of a linear regression model are investigated. In both scenarios, even when the sample size is large enough, the estimates of the posterior distribution are still sensitive to the selection of prior distribution, and the precision increases more slowly than the sample size.

Figuring out how sensitive the posterior is to changes in the prior distribution and the likelihood is a crucial step in the Bayesian workflow. Sensitivity can be distinguished using power-scaling the prior or likelihood (Kallioinen et al., 2024). The Ordinary Least Squares (OLS) method is distribution-free because it does not utilize any distribution of the data. Without making certain assumptions about the probability model that underlies the data, it is impossible to draw any statistical inferences about the slope, intercept, or prediction from the OLS estimates. Thus, all datasets must mitigate Multicollinearity in Bayesian inference and select the appropriate predictive model. This manuscript tried to model the standard OLS regression and four distinct Bayesian regression models, with several shrinkage or regularized prior distributions, for the real dataset which showed collinearity.

The existing and recommended solutions to cover and reduce the Multicollinearity in the presence of highly correlated independent variables are increasing sample size to strengthen the statistical power, omission of one or more of the affected variables from the analysis, combining the strongly correlated variables into a single composite score or switching to more adequate modeling approaches able to handle correlated variables such as principal component analysis (PCA) or partial least-squares (PLS) regression and using regularization methods such as RIDGE and LASSO or Bayesian regression (Voss, 2004; Jaya et al., 2019). However, the omission of one variable or the creation of a composite score can be done for bivariate correlation but leads to different interpretations of the model. Moreover, switching to models that can handle inter-related explanatory variables does not provide the statistical hypothesis test and the hypothesis testing in the regression model has been not solved yet. The method must be able to obtain the parameter estimates with a high level of precision and also facilitate the hypothesis test of regression parameters simultaneously (Pesaran and Smith, 2019). We proposed the Bayesian regression method with weakly informative and shrinkage (regularized) priors as an alternative solution. The Monte Carlo simulation revealed that the Bayesian method solves hypothesis testing in regression analysis with interpretability in the Multicollinearity problem effectively. Therefore, the main purpose of this study is to fit multiple linear regression models using OLS and the Bayesian approach with different shrinkage prior distributions for sensitivity analysis of the priors and to find the best-fitted model for the collinear data.

2 Materials and methods

2.1 Multiple linear regression (classical versus bayesian)

In classical statistical theory, unlike random effect models, which use a random sample from the population for group mean calculations, fixed effect models use regression analysis where group means are fixed (Mummolo and Peterson, 2018). Here, the fixed effect model is used as a linear regression model with one outcome or dependent variable $(Y)$ and $p$ input or independent variables $(X_{1}, X_{2}, X_{2}, \dots, X_{p})$ , as multiple regression in Eqs 1, 2, can be expressed as follows.

Y = β_{0} + β_{1} X_{1} + β_{2} X_{2} + β_{3} X_{3} + \dots + β_{p} X_{p} + ε (1)

Y = X β + ε (2)

where $Y$ is a target or outcome variable with a dimension of $(n \times 1)$ .

$X$ is a design matrix of input variables with a constant column of dimension $(n \times (p + 1))$ .

$β$ is a vector of regression parameters or coefficients having $((p + 1) \times 1)$ dimension.

$ε$ is a vector of error terms with a dimension of $(n \times 1)$ (Mettle et al., 2016).

In matrix notation, multiple linear regression can be rewritten as Eq. 3.

Y = [\begin{array}{c} y_{1} \\ y_{2} \\ y_{3} \\ \dots \\ y_{n} \end{array}] X = [\begin{array}{c} \begin{array}{c} 1 & x_{11} & \dots \\ 1 & x_{21} & \dots \\ 1 & x_{31} & \dots \end{array} \begin{array}{c} x_{1 p} \\ x_{2 p} \\ x_{3 p} \end{array} \\ \begin{array}{c} \begin{array}{c} ⋮ & \dots & ⋱ \end{array} & ⋮ \end{array} \\ \begin{array}{c} 1 & \begin{array}{c} x_{n 1} & \dots \end{array} & x_{n p} \end{array} \end{array}] β = [\begin{array}{c} β_{0} \\ β_{1} \\ β_{2} \\ β_{3} \\ ⋮ \\ β_{p} \end{array}] ε = [\begin{array}{c} ε_{1} \\ ε_{2} \\ ε_{3} \\ ⋮ \\ ε_{n} \end{array}] (3)

From the general multiple linear regression model in Eq. 2 of $p$ input variables on $n$ sample data, the solution derived from Eq. 4.

X^{T} X \hat{β} = X^{T} y (4)

summarizes $p$ normal equations in the $k$ components of $\hat{β}$ . Two parameters have to be estimated, $β$ and $σ^{2}$ and $y \sim N (X β, σ^{2} I)$ . Solving the normal equations by elimination, the vector of regression coefficients shown in Eq. 5 can be computed as follows:

\hat{β} = {(X^{T} X)}^{- 1} X^{T} y (5)

with the vector of residuals shown in Eq. 6;

\hat{e} \equiv (y - X \hat{β}) (6)

However, if more parameters to be estimated are available than observations $(p > n)$ , or if $p \leq n$ , but some of the underlying explanatory variables are perfectly correlated, there will be Multicollinearity $(|X^{T} X| = 0)$ and an inverse of $X^{T} X$ will not exist (Frost, 2019). In either case, there are an infinite number of solutions to Eq. 4, and unless extraneous variables are eliminated by deleting the corresponding rows and columns of $X^{T} X$ (and deleting the corresponding elements of $β)$ , or prior information is brought to bear on the problem in some form to eliminate the ambiguity. A unique solution cannot be obtained (Ullah, 2021).

An alternative approach to estimating and deducing regression model parameters is provided by Bayesian linear regression. The Bayesian method has prior, likelihood, and posterior distributions. The posterior is created by combining the sample data with the prior according to Bayes’ theorem. The normally distributed error assumption, denoted by $ε \sim N (0, σ^{2})$ , is present in the linear regression model that uses the OLS estimation method. In the linear regression model using the OLS estimation method, there is a normally distributed error assumption that is $ε \sim N (0, σ^{2})$ . Since the error term is normally distributed, the three variables $Y / X$ , $β,$ $σ^{2}$ have a normal distribution as does the error (Miroshnikov et al., 2015; Samira, 2023). β and σ2 are the vectors of regression coefficient and residual parameters in the standard regression approach having a normal distribution based on the error term. However, the Bayesian concept thought of these parameters as random variables having distinct distributions with hyper-parameters. In the Bayesian context the distribution β of depend on the choice of the prior distribution, so it will not necessarily be a normal distribution. In Bayesian inference, parameters are considered random variables because their values are uncertain; it means the value is not a single value. The main difference between classical and Bayesian statistics is that in frequentist approaches, parameters are considered fixed and unknown constants that can be estimated from the data. In contrast, Bayesian approaches treat parameters as random variables with their distributions, reflecting uncertainty about their values (Nithin, 2023). This allows for the incorporation of prior knowledge or beliefs about parameters in the Bayesian approach, which gets updated with new data through Bayes’ theorem, leading to a posterior distribution that expresses updated beliefs about the parameters’ values. Thus, $(Y / X, β {, σ}^{2}) \sim N (X β, σ^{2})$ and the joint probability density function (pdf) of these variables can be written as:

p (Y / X, β {, σ}^{2}) = \frac{1}{\sqrt{2 π σ^{2}}} \exp [- \frac{1}{2 σ^{2}} {(Y / X, β {, σ}^{2})}^{T} (Y / X, β {, σ}^{2})] (7)

The likelihood function of these variables is derived from the above probability density function (pdf) and can be expressed as Eqs 8, 9;

p (Y / X, β {, σ}^{2}) = \prod_{i = 1}^{n} \frac{1}{\sqrt{2 π σ^{2}}} \exp [- \frac{1}{2 σ^{2}} {(Y / X, β {, σ}^{2})}^{T} (Y / X, β {, σ}^{2})] = {(σ^{2})}^{- n / 2} \exp [- \frac{1}{2 σ^{2}} {(Y / X, β {, σ}^{2})}^{T} (Y / X, β {, σ}^{2})] (8)

p (Y / X, β {, σ}^{2}) \propto {(σ^{2})}^{\frac{- ν}{2}} [\frac{ν s^{2}}{2 σ^{2}}] \times {(σ^{2})}^{- n / 2} \exp [- \frac{1}{2 σ^{2}} {(Y / X, β {, σ}^{2})}^{T} (Y / X, β {, σ}^{2})] (9)

Regression parameter estimates can be obtained using the Bayesian technique by iterating in the marginal posterior. As shown in Eq. 10, the posterior distribution can be obtained by multiplying the likelihood function by the prior information (Gelman et al., 2013).

P o s t e r i o r \propto P r i o r \times L i k e l i h o o d

p (β {, σ}^{2} / Y, X) \propto p (Y / X, β {, σ}^{2}) \times p (σ^{2}) \times p (β / σ^{2}) (10)

Markov Chain Monte Carlo (MCMC) is a technique that can estimate regression model parameters using a Bayesian approach. The most common MCMC algorithms used in Bayesian estimation are Gibbs sampling, Metropolis-Hastings, and Hamiltonian Monet Carlo approximations.

2.2 Estimation and inference in Bayesian multiple regression

In the Bayesian framework, prior distributions to $β$ and $σ^{2}$ need to be assigned (Miroshnikov et al., 2015). For convenience, Bayesian models use precision $(τ)$ rather than variance $(σ^{2})$ as specified in Eqs 7, 11. Considering this parameterization, a Bayesian Multiple Regression model assumes,

\begin{array}{c} (Y / β, τ) \sim N_{n} (X β, \frac{1}{τ} I), \\ \frac{β}{τ} \sim N_{p + 1} (ϕ, \frac{1}{τ} V), \\ τ \sim D (α, δ) \end{array} (11)

where $α$ and $δ$ are hyper-parameters of the prior $D$ .

A Bayesian point estimate of $β_{i}$ is its posterior mean $ϕ_{* i}$ and a $100 (1 - ω) %$ Bayesian credible interval for $β_{i}$ is $ϕ_{* i} + t_{\frac{w}{2}, n + 2 α} w_{* i i}$ , where $ϕ_{* i}$ is the $i^{t h}$ element of $ϕ_{*} = {(V^{- 1} + X^{T} X)}^{- 1} (V^{- 1} ϕ + X^{T} y)$ , $w_{* i i}$ is the $i^{t h}$ diagonal element of $W_{*} = (\frac{{(y - X ϕ)}^{T} {(I + X V X^{T})}^{- 1} (y - X ϕ) + 2 δ}{n + 2 α}) {(V^{- 1} + X^{T} X)}^{- 1} .$

To evaluate the null hypothesis test: $H_{0} : β_{i} > β_{i 0}$ , the probability can be calculated as $P (t_{(n + 2 α)} > \frac{β_{i 0} - ϕ_{* i}}{w_{* i i}})$ . This expletory index of features in Bayesian inference that is used as the numeric equivalent of P-value in classical statistics was computed as Probability of Direction (PD) using the BRMS package. A hypothesis is considered more believable when the higher its probability (Kruschke and Liddell, 2018).

Prior information can be introduced from the sampling theory viewpoint by imposing side situations on the regression parameters and using the formalism of the general inverse (Soofi, 1990). Prior information enters the problem for a Bayesian when he assesses an informative prior distribution for the regression parameters (Leamer, 1973). However, using diffuse (non-informative) prior will not extricate the analysis from the Multicollinearity problem since such priors do not add enough information.

The parameters that need to be estimated have a probability distribution known as the prior distribution (Consonni et al., 2018). At the same time, the likelihood is a combined distribution of the necessary data parameters, even though it is connected to the probability distribution of the observational and posterior distributions. The prior is decided earlier than the measurement facts are held, so the likelihood function is frequently articulated as a confirming feature of the prior knowledge. Inference on Bayesian models and posterior distributions was done using the “Bayes test” of the R package.

Model selection using the Bayes factor and Bayesian hypothesis testing were carried out by Andraszewicz et al. (2015). An expanded example of using hierarchical regression, which is based on experiment study design in management, the usage of Bayes factors is demonstrated. Reporting and characterizing of the fitted models and posterior distributions can be done using the Highest Density Interval (HDI), credible interval, and the Region of Practical Equivalence (ROPE) percentage, or Equivalence Test functions to check whether the Bayesian regression can be considered non-negligible. The credible interval also known as the Bayesian 95% confidence interval can be interpreted as given the evidence presented by the observed data, the Bayesian Credible Interval (BCrI) contains a 95% chance of holding the true (unknown) value (Hespanhol et al., 2019).

In this study, the Hamiltonian Monte Carlo (HMC) algorithm of Bayesian Regression Models in STAN (BRMS) of the R package has been used to fit Bayesian Regression Models (BRM) and the package “lm” for the classical regression model. Stan makes use of a variation of a No-Uturn Sampler (NUTS) to discover the goal parameter area and provide output. Afterward, until the burn-in requirements are satisfied, the iteration procedure estimates the parameters. The classical multiple linear regression with OLS estimation, Bayesian multiple regression with ridge prior (Model 1), Bayesian multiple regression with Horseshoe prior (Model 2), Bayesian multiple regression with R-Square-Induced Dirichlet Decomposition (R2-D2) prior (Model 3), and Bayesian multiple linear regression with a weakly informative prior (Model 4) from the BRMS package in Stan were fitted.

The scale reduction factor (Rhat) is the root mean square of the separate within-chain standard deviations divided by the standard deviation of the individual relevant scalar measures of interest from all the chains combined. We do not experience any MCMC convergence issues when this number is around 1. For most purposes, an Effective Sample Size (ESS) of more than 1,000 is sufficient to generate stable estimates, even though the ESS should be as large as feasible (Bürkner, 2017). In terms of estimate power, the ESS (Bulk_ESS and Tail_ESS) represents the number of independent samples having the same value as the N auto-correlated samples. “How much independent information there is in auto-correlated chains” is what it measures (Kruschke and Liddell, 2018).

2.3 Types of priors

A prior is a statistical distribution that can be employed to represent the degree of (un)certainty in a population parameter. The posterior, used to produce Bayesian inference, is obtained by weighting the distribution after the prior and likelihood are merged in the Bayesian estimating process (Van de Schoot and Depaoli, 2014).

2.3.1 Non-informative prior

The dimensions of this kind of prior are not well understood. Laplace, Bayes, Jeffreys, and Gauss invented the non-informative prior (Grzenda, 2016). Although Jeffreys’s prior is frequently criticized in multivariate contexts, it was universally accepted in univariate cases (Lemoine, 2019). From a Bayesian point of view, using a (improper) uniform prior yields matching results with standard OLS estimates in the sense that posterior quantiles agree with one-sided confidence bounds. For this and several other reasons, the uniform prior is often considered objective or non-informative.

2.3.2 Informative prior

The informative prior, also known as the prior where information is available about the prior distribution and summarizes the evidence about the parameters concerned from many sources, is referred to as the prior where information is available about the prior distribution (Nasional et al., 2019). Stan considers a Student-t distribution with location 0, the user-specified degrees of freedom, $d_{t}$ , and a reasonable $s_{t}$ that can be written as $t \sim S t u d e n t_t (d_{t}, 0, s_{t})$ . In this manuscript, weakly informative priors such as student_t (3, 0, 2) and the ridge prior as Gaussian (0, 1) were selected for fixed effect parameters $β$ .

2.3.3 Shrinkage priors

Defining a joint distribution for the unobserved regression coefficients is necessary for prior distributions for multidimensional linear regression (Piironen and Vehtari, 2017a). Shrinkage priors such as Bayesian lasso prior (Oluwadare, 2021), spike and slab prior (Wu et al., 2023), the R-square induced Dirichlet Decomposition (R2-D2) prior (Zhang et al., 2022), and Horseshoe prior, aim to shrink the fixed effects of the regression model towards zero (Müller, 2012). Moreover, in Stan, when the sample size is high, the ridge prior produces results that are comparable to those of non-informative priors, but it performs better in small samples. The ridge regression is a Bayesian regression with a Gaussian prior, and using a weakly normal prior is practically the same. The mathematical derivation of the previous ridge in BRMS can be written as $β \sim N (0, γ^{2} I)$ , where $γ^{2}$ is the variance of the coefficient terms, and $I$ is the identity matrix with the same dimension as $β$ . Stronger regularization of the model can be achieved by using a small value for $γ^{2}$ .

The derivation of the R-square-induced Dirichlet Decomposition (R2-D2) prior considers a prior for $β$ filling the conditions $E (β) = 0$ and $c o v (β) = σ^{2} Λ$ , where $Λ$ is a diagonal matrix with diagonal elements $λ_{1}, λ_{2}, \dots, λ_{p}$ . Then,

\begin{array}{c} V a r (X^{T} β) = E_{X} \{{v a r}_{β} (X^{T} β / X)\} + {V a r}_{X} \{E_{X} (X^{T} β / X)\} = E_{X} (σ^{2} X^{T} Λ X) + {V a r}_{X} (0) \\ = σ^{2} E_{X} \{t r (X^{T} Λ X)\} = σ^{2} t r \{Λ E_{X} (X X^{T})\} = σ^{2} t r (Λ Σ) = σ^{2} \sum_{j = 1}^{p} λ_{j} . \end{array}

Thus, $R^{2}$ is represented as

\begin{array}{c} R^{2} = \frac{V a r (X^{T} β)}{V a r (X^{T} β) + σ^{2}} = \frac{σ^{2} \sum_{j = 1}^{p} λ_{j}}{σ^{2} \sum_{j = 1}^{p} λ_{j} + σ^{2}} = \frac{σ^{2} \sum_{j = 1}^{p} λ_{j}}{σ^{2} (\sum_{j = 1}^{p} λ_{j} + 1)} \\ = \frac{\sum_{j = 1}^{p} λ_{j}}{\sum_{j = 1}^{p} λ_{j} + 1} = \frac{W}{W + 1} \end{array}

where $W = \sum_{j = 1}^{p} λ_{j}$ is the sum of the prior variances scaled by $σ^{2}$ (Zhang et al., 2022).

In general, the shrinkage priors, shown in Eq. 12, are essentially written as a global-local scale mixture of the Gaussian family as summarized in Polson and Scott (2010) and written as:

\begin{array}{c} Y_{i} = β^{γ} X_{i} + ε_{i}, i = 1, 2, 3, \dots, n; ε_{i} \sim N (0, σ^{2}) \\ β_{j} | λ_{j}, γ \sim N (0, γ^{2} λ_{j}^{2}) \end{array} (12)

λ_j ∼ C₊(0,1), where j = 1, 2, 3, …, p

λ_j ∼ Bernoulli for Spike - and - slab prior λj ∼ Exponential for Dirichlet - Laplace prior.

λj ∼ Half-Cauchy for Horseshoe prior. $γ$ is the global shrinkage parameter and

$λ_{j}$ are the local shrinkage parameters

With normalized covariates, the posterior mean of each regression coefficient is reduced from the maximum likelihood solution by a shrinkage factor $K_{j}$ .

{\bar{β}}_{j} = (1 - K_{j}) {\hat{β}}_{j, M L} K_{j} = \frac{1}{1 + n σ^{- 2} γ^{2} λ_{j}^{2}}

2.4 Model fit and comparison criteria

As suggested by McElreath (2018), Bayesian regression results of all fitted models were compared to obtain the best-fitted model using Leave-One-Out Information Criteria (LOO-IC), Watanabe-Akaike Information Criteria (WAIC), and K-fold cross-validation criteria. Furthermore, the Root Mean Squared Error (RMSE) and the Mean Absolute Error (MAE) were used to evaluate predictive precision. It had adapted the original definition of all criteria so that small values imply better models. The WAIC and the LOO-IC are more recently developed measures of complexity penalized fit and are based on averaging over the posterior distribution, rather than using posterior means, $\bar{θ}$ , of the parameters or other point estimates of $θ$ . For any application data set with no missing values, the WAIC is obtained as

W A I C = - 2 (L P P D (y | θ) - d_{e})

where, $d_{e} = - 2 E_{θ} [\log \{p (y | θ)\} |y] + 2 \log [p (y| \hat{θ})]$ is the estimated effective model dimension (complexity), and $L P P D (y | θ) = \sum_{i = 1}^{n} \log \int p (y | θ) p (θ | y) d θ$ is the Log Posterior Predictive Density (LPPD) for $y$ . The LLPD is an estimate, although biased, of the Expected Log Posterior Predictive Density (ELPD) for (unobserved) new data, $\tilde{y}$ generated from the same density as the observed data y, and the complexity measure is a measure of bias.

The resulting vector of likelihoods, for observation $i$ and samples $k = 1, 2, \dots, K$ , can be denoted $L_{i} = (L_{i 1}, L_{i 2}, \dots, L_{i K})$ . Then, $L P P D (y_{i} |θ) = \log ({\bar{L}}_{i})$ , and the total of these over observations is the estimate of the LPPD. The estimated complexity for the WAIC is obtained by monitoring logarithmic probabilities during MCMC sampling that can be denoted by ${L L}_{i k} = \log (L_{i k})$ . The variance of ${L L}_{i}$ gives the complexity for that observation as $d_{e i} = V a r ({L L}_{i})$ and $d_{e} = \sum d_{e i}$ . Then, the estimated pointwise WAIC is computed as $- 2 (Log ({\bar{L}}_{i}) - d_{e i})$ , and the total WAIC is the sum of the piecewise WAIC (Vehtari et al., 2017).

The Pareto-smoothed importance sampling (PSIS) estimates of the LOO-IC use an estimate of the leave-one-out predictive fit or expected log pointwise predictive density (ELPD). The ELPD can be estimated as ${E L P D}_{L o o} = \sum_{i = 1}^{n} \log \{p (y_{i} | y_{- i})\}$

where $p (y_{i} | y_{- i}) = \int p (y_{i} |θ) p (θ| y_{- i}) d θ$ . Then, LOO-IC is estimated as $- 2 \times {E L P D}_{L o o}$ .

On the other hand, according to Chicco et al. (2021), the root mean squared error (RMSE) and the mean absolute error (MAE) can be computed as

\begin{array}{c} R M S E = \sqrt{M e a n S q u a r e d E r r o r} = \sqrt{\frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{n}} \\ M A E = \frac{\sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|}{n} \end{array}

2.5 Application data set

The secondary data sets from the UCI machine learning repository have been utilized in this study. The application data from twelve distinct building shapes were gathered. There are 8 distinct variables in 768 samples in the data. The dataset has two responses (or outcomes, denoted by $Y_{1}$ and $Y_{2}$ ) and eight qualities (or features, denoted by $X_{1}, X_{2}, \dots, X_{8}$ based on energy efficiency instances. The objective is to predict each of the two target variables (response) using the eight attributes (Tsanas and Xifara, 2012). In this application dataset, the five features $(X_{1}, X_{2}, \dots, X_{5})$ that exhibit at least moderate correlations with the two dependent variables $(Y_{1} and Y_{2})$ were used. Related studies on energy prediction such as Bui et al. (2019); Guo et al. (2023); Jitkongchuen and Pacharawongsakda (2019); Kim and Suh (2021), and Abdou et al. (2022) did not care about the collinearity problem. Thus, the chosen variables are relative compactness $(X_{1})$ , surface area $(X_{2})$ , wall area $(X_{3})$ , roof area $(X_{4})$ , and overall height $(X_{5})$ .Whereas, orientation $(X_{6})$ , glazing area $(X_{7})$ , and glazing area distribution $(X_{8})$ are excluded. The two response variables were heating load $(Y_{1})$ and cooling load $(Y_{2})$ . This manuscript used heating load $(Y_{1})$ as an outcome variable associated with the five (5) selected features. Various multiple linear regression models were fitted with the Ordinary Least Squares (OLS) method and the Bayesian approach to assess the effect of Multicollinearity on estimates and parameter inferences.

3 Results and discussion

3.1 Correlation analysis of selected variables

According to (Ullah, 2021), when the correlation coefficient between the features is greater than 0.75 then the two features are highly correlated, which leads to a collinearity problem. Due to a weaker effect or very weak correlation among $X_{6}, X_{7,}$ and $X_{8}$ with the target variable $Y_{1}$ , this interdisciplinary study focused on the selected characteristics with collinearity. Figure 1 shows that relative compactness $(X_{1})$ has almost perfect collinearity $(r = - 0.99)$ with surface area $(X_{2})$ and strong negative collinearity $(r = - 0.87)$ with roof area $(X_{4})$ . Moreover, relative compactness $(X_{1})$ had strong positive collinearity $(r = 0.83)$ with overall height $(X_{5})$ . The second feature, surface area $(X_{2})$ has strong collinearity $(r = 0.88)$ with roof area $(X_{4})$ , strong negative collinearity $(r = - 0.86)$ with overall height $(X_{5})$ . Likewise, roof area $(X_{4})$ has a strong negative correlation $(r = - 0.97)$ with overall height $(X_{5})$ . On the other hand, the three excluded variables, orientation $(X_{6})$ , glazing area $(X_{7})$ , and glazing area distribution $(X_{8})$ , had a too weak correlation with the other predictors and dependent variable. Approximately, the exact collinearity $(r \approx 1)$ between $X_{1}$ and $X_{2}$ , and $X_{4}$ and $X_{5}$ might lead wider confidence interval and inflated Standard Errors (SE) in regression. Moreover, inspection of the determinant of the correlation matrix (D) gives an idea of the degree of Multicollinearity. Therefore, with these strong or perfect correlations between the input characteristics in a linear regression analysis, in linear models, the accuracy of the predicted regression coefficients decreases relative to the case where the predictors were not correlated.

Figure 1

Figure 1. Pearson correlation coefficient matrix of variables.

3.2 Results of classical linear regression model using an application dataset

Here, a multiple linear regression model was fitted with the ordinary least squares (OLS) method for the outcome heating load $(Y_{1})$ regressed with five highly correlated features in the Energy Efficiency Dataset.

Based on the results of the standard OLS regression model, the P value of each regressor is less than $α = 5 %$ in Table 1, all the four independent variables or the features: relative compactness $(X_{1})$ , surface area $(X_{2})$ , wall area $(X_{3})$ and overall height $(X_{5})$ had a significant effect on heating load $(Y_{1})$ except that of roof area $(X_{4})$ that cannot be determined its effect due to its high collinearity with overall height $(X_{5})$ . However, the predictor roof area $(X_{4})$ is excluded, computationally hard excluded from the model estimation due to high collinearity with overall height $(X_{5})$ as can be seen in Table 1. The 95% confidence intervals for all feature variables in Table 1 did not overlap zero. Figure 2 shows the significance of all variables and the decision on the null hypothesis, $H_{0} : β_{X_{j}} = 0$ for $j = 1, 2, \dots, p = 5$ , for each feature. Thus, the hypothesis is rejected for relative compactness $(X_{1})$ and overall height $(X_{5})$ , whereas $H_{0}$ is accepted for surface area $(X_{2})$ and wall area $(X_{3})$ . The estimates of the regression coefficients showed the important negative effect of relative compactness $(X_{1})$ and surface area $(X_{2})$ on heating load $(Y_{1})$ . However, the positive effect of wall area $(X_{3})$ and overall height $(X_{5})$ on heating load $(Y_{1})$ . In addition to the correlation matrix, the Variance Inflation Factor (VIF) in Table 1 showed the occurrence of high collinearity or Multicollinearity among the features in the dataset. Thus, the standard multiple linear regression with the ordinary least squares (OLS) method reveals biased estimates due to high collinearity among the features. Due to the high Multicollinearity effect in the data, the numerical instability problem in computation occurred in non-deterministic estimates of roof area $(X_{4})$ in the OLS approach (Figure 1; Table 1). This finding is supported by the findings of Ročková and George (2014) and Soofi (1990).

Table 1

Table 1. Multiple linear regression models using the OLS method for the heating load $(Y_{1})$ .

Figure 2

Figure 2. Hypothesis testing of OLS regression estimates and significance.

3.3 Results of Bayesian linear regression models using application data

A Bayesian interpretation of the conventional confidence interval can be understood as the probability (e.g., 95%) that the population parameter lies between the specific upper and lower boundaries ascertained by the posterior distribution in the Bayesian credibility interval (Gelman et al., 2020). By using the same model but different types of prior (weakly informative and shrinkage priors), we test the sensitivity to the prior; and identify the pattern of posterior probabilities and the best-performing model. As per (Van Erp et al., 2018; Depaoli et al., 2020), it is imperative to validate the sensitivity of the prior and likelihood before scrutinizing the influence on the posterior distribution and estimates.

3.4 Model comparison results in applications dataset

Comparing the marginal posterior under various priors is advised since the marginal posterior of regression parameters can be immediately observed when using the Bayesian technique. The Bayesian multiple linear regression with ridge prior (Model 1), Bayesian multiple regression with horseshoe prior (Model 2), Bayesian multiple linear regression with R-square-induced Dirichlet decomposition (R2-D2) prior (Model 3), and Bayesian multiple linear regression with weakly informative prior (Model 4) from the BRMS package in Stan were fitted. To compare the model fit, we compute the Leave-One-Out Information Criteria (LOO-IC), Watanabe-Akaike Information Criteria (WAIC), the Root Mean Squared Error (RMSE), and the Mean Absolute Error (MAE), coefficient of determination $(R^{2})$ , and the K-fold criteria of Bayesian-based models fit evaluation criteria and identify the best model after fitting the models.

Based on Table 2, the highest percentage $(R^{2} = 84.3 %)$ of the total variation in heating load $(Y_{1})$ was explained by the five (5) characteristics of the Bayesian multiple linear regression with weakly informative prior (Bayesian model: 4). Furthermore, the smallest values of LOO-IC, WAIC, K-fold, RMSE, and MAE were observed for Bayesian multiple linear regression with weakly informative prior (Bayesian model: 4). Thus, the Bayesian multiple linear regression model with weakly informative prior (Bayesian Model: 4) is the best-fit model compared to the standard OLS regression, the BRM with ridge prior (Bayesian Model 1), horseshoe prior (Bayesian Model 2) and R-Square-Induced Dirichlet Decomposition (R2-D2) prior (Bayesian Model 3). Sensitivity analysis and inference (estimation, hypothesis testing, and feature selection with prediction) of regression estimates were applied based on the best-fitted BRM (Bayesian Model: 4).

Table 2

Table 2. Model assessment and comparisons using energy efficiency data.

3.5 Sensitivity analysis in the regression model

Collinearity increases the sensitivity of estimates to the model misspecification. The sensitivity analysis of priors can be evaluated through inference of regression estimates which measures the quantity by which the posterior mean shrinks the OLS estimate of a regression coefficient to zero (Lavine, 1991). Despite its frequent value, sensitivity analysis lacks a technique for validating parameter hypotheses or for calculating Standard Errors (SE) that account for model uncertainty (Taraldsen et al., 2022). Horseshoe prior has been shown to have good theoretical characteristics and performs well in practice, producing outcomes that are quite comparable to those of the spike-and-slab prior (Piironen and Vehtari, 2017b). Therefore, by using sensitivity analysis, posterior inferences are compared under several plausible prior distribution choices (Hamra et al., 2013).

As shown in Table 3, describe the knowledge of the significance of sensitivity analysis and the role of prior distributions when applying Bayesian approaches with a power-scaling sensitivity analysis (using the powerscale_sensitivity function in the R package priorsense). The power-scaling sensitivity analysis indicates prior and likelihood sensitivity for all input feature regression coefficients. Moreover, most of the low likelihood sensitivity was observed for b_X2, $b_X 3$ , and $b_X 4$ . This indicates a weak likelihood. However, all show that there is both prior and likelihood sensitivity for two of the fixed effect parameters, b_X1, and $b_X 5$ . Moreover, this indicates that there may be a priori data conflict. Power scaling sensitivity analysis on the selected Bayesian model fit shows that there was a longer prior sensitivity and there is appropriate likelihood sensitivity (Table 3; Figure 3).

Table 3

Table 3. BRM Sensitivity Diagnosis with weakly informative prior for the heating load.

Figure 3

Figure 3. BRMS Sensitivity Analysis Plot of the Posterior Density.

In contrast to a frequentist method, which tests effects against “zero,” Bayesian inference is not predicated on statistical significance. The Bayesian framework provides a probabilistic perspective on the parameters, enabling the evaluation of the associated uncertainty. Therefore, we would argue that the probability of being outside a particular range that can be defined as “practically no effect” (i.e., an insignificant magnitude) is adequate rather than concluding that an effect is present when it merely departs from zero. The Region of Practical Equivalence (ROPE) is the name given to this range. If there are non-independent covariates or occurrences of Multicollinearity among predictors that lead to strong correlations among parameters, the joint parameter distributions may shift within or outside the ROPE. Collinearity disproves ROPE and hypothesis testing based on univariate marginal since the probabilities rely on independence.

The most troubling parameters are those that just partially overlap the ROPE region and the “undecided” parameters’ findings, which could go more in the direction of “rejection” or away from it. For many parameters of the application data set in this manuscript, the undecided decision on the null hypothesis has occurred. Thus, conclusions drawn solely on ROPE are incorrect in the situation of collinearity, since the (joint) distributions of these parameters may experience an increase or reduction in ROPE. (Kruschke, 2014). Another approach for feature importance positions is to check projection predictive variable selection (Piironen and Vehtari, 2015). To check the convergence of MCMC, we draw trace plots and autocorrelation convergence plots with four chins for the best-fitted model. Based on the Bayesian multiple linear regression best-fit model in weakly informative prior for heating load, in Table 4, the credible interval of the intercept and regression coefficients are reported as the frequentist confidence intervals, but the interpretation is from the Bayesian viewpoint. Possible Multicollinearity between b_X5 and b_X1 (r = 0.83) results in inconsistent estimation and biased decisions in the hypothesis tests between frequentist and Bayesian thoughts (Soofi, 1990).

Table 4

Table 4. Bayesian linear regression best-fitted model in weakly informative prior for heating load $(Y_{1})$ .

According to Table 4, based on the data observed, it is believed that there is a 95% probability that heating load $(Y_{1})$ will increase by 35.1% up to 58.6% for each additional 10-cm increase in the overall height $(X_{5})$ . Thus, about the data that have been noticed, there is a 95% possibility that the true (unknown) estimate of overall height $(X_{5})$ would be within the interval [3.51, 5.86]. This implies that for every 1-cm increase in the overall height $(X_{5})$ , the predicted heating load $(Y_{1})$ increases by 4.64 units. Bayesian regression estimation (Table 4) revealed that the effect of relative compactness $(X_{1})$ on heating load ( ${b_X}_{1}$ = −39.27, 95% BCrI [−80.68, 0.75]), such that for each increase of one unit in relative compactness, predicted heating load $(Y_{1})$ decreases by 39.27 units. The effect of surface area $(X_{2})$ on heating load ( ${b_X}_{2}$ = −0.04, 95% BCrI [−1.76, 1.66]), such that for every one-unit increase in surface area $(X_{2})$ , the predicted heating load $(Y_{1})$ decreases by 0.04 units. The effect of wall area $(X_{3})$ on heating load ( ${b_X}_{3}$ = 0.05, 95% BCrI [−1.65, 1.78]), such that for each one-unit increase in wall area $(X_{3})$ , the predicted heating load $(Y_{1})$ increased by 0.05 units. The effect of roof area $(X_{4})$ on heating load ( ${b_X}_{4}$ =−0.01, 95% BCrI [−3.41, 3.46]), such that for every one-unit increase in roof area $(X_{4})$ , the predicted heating load $(Y_{1})$ decreased by 0.01 units. Finally, the intercept has an estimated value of 43.02.

In the Bayesian Regression Model, there should be evidence of checking non-convergence for the four chains before looking at the model summary and valid inferences from the posterior draws. The last three values in Table 4 (“ESS_bulk”, “ESS_tail”, and “Rhat”) provide information on how well the algorithm could estimate the posterior distribution of the parameter. The “Rhat” value is close to or equal to 1, the posterior draws did not have a convergence problem with the MCMC algorithm in Bayesian regression modeling using Stan. In addition, in Figures 4, 5, the four chains mix well for all of the parameters, and therefore there is no evidence of non-convergence. Generally speaking, the posterior mean (called “Estimate”), standard deviation (called “Est. Error”), and two-sided 95% credible intervals (called “l-95% CI” and “u-95% CI”) as HDI are used to summarize each parameter.

Figure 4

Figure 4. BRMS convergence for the heating load $(Y_{1})$ with weakly informative prior.

Figure 5

Figure 5. Convergence trace plots of best-fitted Bayesian regression model coefficients.

Interval estimation has a very natural interpretation in Bayesian inference: the 95% CI. The key distinctions between a frequentist CI and a Bayesian HDI or BCrI are assessed here. The results of the classical regression model in Table 1 and Figure 2 showed the significant effect of all input features, the acceptance of four null hypotheses, and the rejection of one hypothesis. The intercept and regression coefficient estimates had huge variations. The proportion of HDIs located within the Region of Practical Equivalence (ROPE) is used as a decision criterion for null hypothesis testing. The HDI plus ROPE decision rule (Test for Practical Equivalence) was suggested by (Kruschke, 2018) to determine if parameter values should be accepted or rejected in light of a null hypothesis that has been expressly stated (Kruschke and Liddell, 2018). As shown in Tables 4, 5; Figures 6–8, the HDI for overall height $(X_{5})$ is completely outside the ROPE [−1.01, 1.01]. The percentage of the posterior enclosed by ROPE [−1.01, 1.01] for overall height $(X_{5})$ is 0%. Therefore, the null hypothesis, $H_{0}$ , for ${b_X}_{5}$ is rejected. The Region of Practical Equivalence (ROPE) did not completely cover the HDI for any of the parameters, and none of the hypotheses is accepted. All null hypotheses about the parameters ${b_X}_{1}$ , ${b_X}_{2}$ , ${b_X}_{3}$ , and ${b_X}_{4}$ were undecided (Figure 8). It can be used to use the 89% or 95% BCrIs instead of the 95% confidence interval (as in the frequentist framework), as the 89% level provides results which had greater stability (Kruschke, 2014) and reminds us about the uncertainty of such agreements (McElreath, 2018).

Table 5

Table 5. Bayes factor and ROPE of the best-fitted model in weakly informative prior.

Figure 6

Figure 6. ROPE plot for the best-fitted model parameters.

Figure 7

Figure 7. HDI plot for the best-fitted model parameters.

Figure 8

Figure 8. Hypothesis testing in BRM Parameters.

Table 4 also showed that PD and the percentage in ROPE of the linear association between overall height $(X_{5})$ and heating load are about 100% and 0%, respectively, according to certainty and the significant effect of overall height $(X_{5})$ on heating load.

Based on Appendix Table A1, there is only slight fluctuation in the classical and Bayesian estimates of the regression coefficients; however, a huge variation was observed on the intercept, $β_{0}$ , which has a posterior mean of 43.02 and the classical OLS regression has an intercept as an overall mean of 84.177. The highest posterior density interval is one of the Bayesian Credible Intervals (BCrI) that had threshold values of the posterior distribution that, around the distribution center, represent an interval with the probability of interest (e.g., 95% of the distribution mass). These values are interpreted under the assumption that all values within the interval have higher probabilities of representing the parameter than all values outside them (Hespanhol et al., 2019).

The greatest substitute for the p-value in the frequentist model is the Probability of Direction (PD), which measures the likelihood that the input features’ effects will be positive or negative. Among the independent variables taken as input characteristics, overall height and wall area $(X_{3})$ had a positive effect; whereas other characteristics had negative effects (Figure 9).

Figure 9

Figure 9. Probability of effect direction for the best-fitted model parameters.

In addition to 89% (95%) HDI and ROPE, the Bayes factor is used for decision-making in hypothesis testing. A Bayes factor of more than one is seen as proof against the null hypothesis, and a Bayes factor smaller than 0.33 is interpreted as a considerable indication in favor of the null hypothesis. However, according to (Andraszewicz et al., 2015), a Bayes factor greater than 3 can be considered “substantial” evidence against the null hypothesis.

As shown in Figure 10, the posterior predictive distribution that compares the observed data $(y)$ with the posterior predicted values $(y_{r e p})$ had a slightly identical pattern and the estimate of the kernel density for the data and the posterior predictive values are comparable. Based on the variables in Figure 11 and the selection of the posterior predicted variables projected posterior predictive variable selection determined with the cv_varsel function by computing a LOO-CV estimation of the most accurate prediction performance for the best model with a certain number of variables, the five input features or independent variables are important for estimating the heating load. Furthermore, the ranking of the significance of variables for heating load is overall height $(X_{5})$ , wall area $(X_{3})$ , relative compactness $(X_{1})$ , surface area $(X_{2}),$ and roof area $(X_{4})$ . Cross -Validation (CV) ranking using the projection predictive variable selection technique in Stan was applied to determine the position of each independent variable in sub-model size from full data as shown in Figure 11. In the OLS approach, the non-deterministic independent variable, roof area X4, became the least important in the Bayesian methods (Piironen and Vehtari, 2015).

Figure 10

Figure 10. BRMS posterior predictive check for heating load $(Y_{1})$ .

Figure 11

Figure 11. Variable Selection for Predictive Performance of Heating Load $(Y_{1})$ .

4 Conclusion and recommendation

This manuscript demonstrates the effect of Multicollinearity in estimation and hypothesis tests of linear regression models with OLS and Bayesian approaches of several prior distributions using collinear energy efficiency data. Preliminarily strongly correlated independent variables or features with the outcome or dependent variables were selected based on the correlation analysis. The correlation analysis and the VIF results showed the occurrence of high Multicollinearity among predictors in the data. The excluded variable (roof area $(X_{4}))$ in the OLS model due to collinearity has been estimated using the Bayesian approach. Four different Bayesian multiple regression models were fitted with identical five input variables with four different prior information. Among the classical OLS regression model, the four fitted Bayesian models: Bayesian Multiple Regression (BMR) with ridge prior (Model 1), BMR with horseshoe prior (Model 2), BMR with R-square-induced Dirichlet Decomposition (R2-D2) prior (Model 3), and BMR with weakly informative prior (Model 4), the Bayesian Multiple Regression (BMR) model with weakly informative prior was best fitted. The classical regression result showed that all five independent variables have a significant effect on the heating load. However, the hypothesis indicates acceptance of the first four null hypotheses. The posterior mean estimates and Standard Error (SE) of every coefficient are different from the equivalent frequentist OLS estimates and SE. There is an effect for regression parameter inference (estimation, hypothesis testing, and prediction) due to Multicollinearity among the input features. The Bayesian hypothesis testing using HDI plus ROPE showed a rejection of a null hypothesis for the overall height and an undecided decision on the other parameters due to the non-independence of predictors. However, the importance ranking of input features was checked by the selection of projection predictive variables showing overall height, wall area, relative compactness, surface area, and roof area. It is necessary to check the Multicollinearity effect for regression modeling with the Bayesian and frequentist approaches for any applied research in science, engineering, agriculture, health, and other discipline datasets. In addition, by careful identification of the key drivers of energy efficiency in buildings, this study provides a valuable framework for researchers, policymakers, and industry stakeholders to implement cleaner and more sustainable energy estimation practices.

This study considered a full model with overall samples or train subsets and assessed the difference in posterior estimates under the OLS approach and four distinct priors. However, fitting several sub-models with sub-samples as split subsets of the overall dataset and using another test of posterior difference such as the Kolmogorov–Smirnov test was not used. It is suggested to use K-fold cross-validation, ensemble, data augmentation, and data simplification techniques by split subsets act as the testing set, and the remaining folds will train the model.

Further research could concentrate on Bayes factors that assess the significance of correlated covariates jointly are more appropriate, and certain priors may be more negatively affected in such a setting. This is in addition to the routine examination of the correlation matrix and the posterior distribution in various prior settings.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

LA-E: Conceptualization, Investigation, Methodology, Project administration, Resources, Software, Visualization, Writing–original draft, Writing–review and editing. EE: Conceptualization, Data curation, Formal Analysis, Methodology, Resources, Software, Validation, Writing–original draft, Writing–review and editing. YM: Conceptualization, Data curation, Resources, Supervision, Validation, Visualization, Writing–original draft, Writing–review and editing.

Funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This is supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2024R443), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Acknowledgments

Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2024R443), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Abbreviations

BCrI, Bayesian Credible Interval; BF, Bayes factor; BMR, Bayesian multiple regression; BRMS, Bayesian regression model using stan; CI, Confidence interval; ELPD, Expected log-posterior predictive density; ESS, Effective sample size; HDI, Highest density interval; LOO-IC, Leave-one-out information criteria; MAE, Mean absolute error; MCMC, Markov chain monte carlo; OLS, Ordinary least squares; PD, Probability of direction; R2, Coefficient of determination; R2-D2, R-Square induced dirichlet decomposition; Rhat, Scale reduction factor; RMSE, Root mean squared error; ROPE, Region of practical equivalence; VIF, Variance inflation factor; WAIC, Watanabe-Akaike information criteria.

References

Abdou, N., El Mghouchi, Y., Jraida, K., Hamdaoui, S., Hajou, A., and Mouqallid, M. (2022). Prediction and optimization of heating and cooling loads for low energy buildings in Morocco: an application of hybrid machine learning methods. J. Build. Eng. 61, 105332. doi:10.1016/J.JOBE.2022.105332