Interpretable modeling for short- and medium-term electricity load forecasting

We consider the problem of short- and medium-term electricity load forecasting by using past loads and daily weather forecast information. Conventionally, many researchers have directly applied regression analysis. However, interpreting the effect of weather on these loads is difficult with the existing methods. In this study, we build a statistical model that resolves this interpretation issue. A varying coefficient model with basis expansion is used to capture the nonlinear structure of the weather effect. This approach results in an interpretable model when the regression coefficients are nonnegative. To estimate the nonnegative regression coefficients, we employ nonnegative least squares. Real data analysis shows the practicality of our proposed statistical modeling. The interpretation would be helpful for making strategies for energy-saving intervention and demand response.


Introduction
Short-and medium-term load forecasting with high accuracy is essential for decision making during trade in electricity markets.In the past several decades, many forecasting methods have been proposed in literature.The forecasting techniques are mainly classified into two categories: machine learning approaches and statistical approaches.The machine learning techniques for example, support vector regression (Elattar et al., 2010;Chen et al., 2017;Jiang et al., 2018), neural networks (He, 2017;Kong et al., 2018;Guo et al., 2018b;Bedi and Toshniwal, 2019), and hybrid of multiple forecasting techniques (Cho et al., 2013;Miswan et al., 2016;Liu et al., 2017;de Oliveira and Oliveira, 2018), have attracted attention in recent years.These techniques capture complex nonlinear structures; therefore, high forecast accuracy is expected.
Meanwhile, statistical approaches have also been extensively studied.Traditional statistical approaches include linear regression (Amral et al., 2008;Dudek, 2016;Saber and Alam, 2017), smoothing spline (Engle et al., 1986;Harvey and Kopman, 1993), autoregressive integrated moving average (ARIMA, Lee and Ko, 2011), and Kalman filter (Amjady, 2001).Recently, several authors have applied functional data analysis, where the daily curves of electricity loads are expressed as functions (Cabrera and Schulz, 2017;Vilar et al., 2018).Most of the statistical approaches are based on probabilistic forecasts, which lead to construction of the forecast interval.The distribution of the forecast values is helpful for risk management (Cabrera and Schulz, 2017).Reviews of the probabilistic forecasts were carried out by Hong and Fan (2016); van der Meer et al. (2018).
To forecast the loads, we usually use past loads, weather, and other factors as exploratory variables (e.g., Lusis et al., 2017).In practice, the time intervals are different among exploratory variables.For example, assume we forecast loads for a day that is a few days after today at 30-minute intervals; this is a common scenario for market transactions in electricity exchanges (e.g., the day-ahead market in the European Power Exchange, EPEX).The electricity loads are collected at 30-minute intervals using a smart meter, whereas weather forecast information, such as maximum temperature and average humidity, is usually observed at intervals of one day.Thus, this study uses past loads at 30-minute intervals and daily weather forecast information as exploratory variables.The other variables, such as the interior environment of buildings (e.g., Yildiz et al., 2017), may improve the accuracy but are not used in this study in order to illustrate a basic idea.Our proposed method, which will be described later, is able to directly incorporate the other variables.
From a suppliers' point of view, it is crucial to produce an interpretable model to investigate the impact of weather on the loads.For example, estimating the amount of electricity fluctuations caused by weather would be helpful for making strategies for energy-saving intervention (e.g., Guo et al., 2018a;Wang et al., 2018) and demand response (e.g., Ruiz-Abellón et al., 2020).To produce an interpretable model, it seems one can directly add weather forecast information to the exploratory variables in the regression model (Hong et al., 2010) and investigate the estimator of regression coefficients.
More generally, techniques for interpreting any black-box model, including the deep neural networks, have been recently proposed, for example, Local Interpretable Model-agnostic Explanations (LIME) (Ribeiro et al., 2016) and SHapley Additive exPlanations (SHAP) (Lundberg and Lee, 2017).However, these methods are used for variable selection, i.e., a set of variables that plays an essential role in the forecast is selected.The variable selection cannot estimate the amount of electricity fluctuations caused by weather.
In contrast to variable selection, decomposition of the electricity load at time t, say y t , into two parts is useful for interpretation: where µ t and b t are the effects of past loads and weather forecast information, respectively.
Typically, we use loads at the same time interval of the previous days as exploratory variables (e.g., Lusis et al., 2017), and regression analysis is separately conducted on each time interval.We then construct estimators of µ t and b t , say μt and bt , respectively.
The interpretation is carried out by plotting a curve of bt .However, without elaborate construction and estimation of b t , we face two issues.
The first issue is that the curve of bt often becomes non-smooth at any time interval (i.e., every 30 minutes) in our experience.The non-smooth daily curve of b t is unrealistic because it implies the impact of daily weather forecast on loads changes non-smoothly every 30 minutes.This problem is caused by the fact that the regression analysis is separately conducted at each time interval.To address this issue, we should estimate parameters under the assumption that b t is smooth.
The second issue is related to the parameter estimation procedure.In many cases, the regression coefficients are estimated through the least squares method.In our experience, however, the estimate of regression coefficients related to b t can be negative.The negative regression coefficients lead to a negative value of bt .Since bt is the amount of electricity fluctuations caused by weather, the interpretation becomes unclear.To alleviate this problem, we need to restrict the regression coefficients associated with b t to nonnegative values.
In this study, we develop a statistical model that elaborately captures the nonlinear structure of daily weather information to address two challenges, as mentioned earlier.
We employ the varying coefficient model (Hastie and Tibshirani, 1993;Fan and Zhang, 1999) with basis expansion, where the regression coefficients associated with weather are assumed to be different depending on the time intervals.The regression coefficients are expressed by a nonlinear smooth function with basis expansion, which allows us to generate a smooth function of bt .Furthermore, the weather effect bt is also expressed as a nonlinear smooth function.To generate nonnegative regression coefficients, we employ the nonnegative least squares (NNLS, e.g., Lawson and Hanson, 1995) estimation.NNLS estimates parameters under the constraint that all regression coefficients are nonnegative.
With the NNLS estimation, the value of bt is always nonnegative; thus, the interpretation becomes clear.The proposed statistical modeling is applied to actual electricity load data on a research facility.The results show that our proposed model is able to produce an interpretable weather effect.Moreover, the forecast accuracy of our proposed model is comparable to (slightly better than) some existing methods.
The remainder of this paper is organized as follows: Section 2 describes our proposed model based on the varying coefficient model.In Section 3, we present the parameter estimation via nonnegative least squares.Section 4 presents the analysis of actual electricity load data.Concluding remarks are given in Section 5. Some technical proofs are deferred to the Appendices.

Proposed model
Short-and medium-term forecasting is often used for trading electricity in the market.
Among various electricity markets, the day-ahead (or spot) and the intraday markets are popular in electricity exchanges, including the European Power Exchange (EPEX) (https://www.epexspot.com/en/market-data/dayaheadauction)and Japan Electric Power Exchange (JEPX) (http://www.jepx.org/english/index.html).In the dayahead market, contracts for the delivery of electricity on the following day are made.In the intraday market, the power will be delivered several tens of minutes (e.g., 1 hour in JEPX) after the order is closed.In both markets, transactions are typically made in 30-minute intervals; thus, the suppliers must forecast the loads in 30-minute intervals.
In this study, we consider the problem of forecasting loads that can be applied to both day-ahead and intraday markets.
Typically, J = 48, because we usually forecast the loads in 30-minute intervals.We consider the following model: where µ ij is the effect of past electricity load, b ij is the effect of weather, such as temperature and humidity, and ε ij are error terms with Typically, the error variances in the daytime are larger than those at midnight because of the uncertainty of human behavior in the daytime.Therefore, it would be reasonable to assume that V [ε ij ] = σ 2 j , i.e., the error variances depend on the time interval.One may assume the correlation of errors for different time intervals, i.e., Cor(ε ij , ε ij ) = 0 for some j = j ; however, the number of parameters becomes large.For this reason, we consider only the case where the errors are uncorrelated.Note that the final implementation of our proposed procedure described later is independent of the assumption of the correlation structure in errors.
One can express b ij and µ ij as linear or nonlinear functions of predictors and conduct the linear regression analysis.With this procedure, however, we face two issues, as mentioned in the introduction; thus, we carefully construct appropriate functions of b ij and µ ij .

Expression of b ij
Weather forecast information is typically observed at intervals of one day and not 30 minutes (e.g., the maximum temperature of average humidity).For this reason, we assume that the weather forecast information does not depend on j.Let a vector of weather information be s i .We express b ij as a function of s i .Here, we assume two structures as follows: • It is well known that the relationship between weather variables and consumption is nonlinear.For example, the relationship between maximum temperature and consumption is approximated by a quadratic function (e.g., Hong et al., 2010) because air conditioners are used on both hot and cold days.For this reason, it is assumed that b ij is expressed as some nonlinear function of s i .
• Although s i does not depend on j, the effect of weather, b ij , may depend on j.
For example, consumption in the daytime is affected by the maximum temperature more than that at midnight.In this case, the regression coefficients associated with s i change according to the time interval j.However, if we assume different parameters at each time interval, the number of parameters can be large, resulting in poor forecast accuracy.To decrease the number of parameters, we use the varying coefficient model, in which the coefficients are expressed as a smooth function of the time interval.
Under the above considerations, we propose expressing b ij as follows: where g m (s i ) (m = 1, ..., M ) are basis functions given beforehand, β m (j) are functions of regression coefficients, and M is the number of basis functions.
We also use the basis expansion for β m (j): where h q (j) (q = 1, ..., Q) are basis functions given beforehand and γ qm are the elements of the coefficient matrix Γ = (γ qm ).Substituting (4) into (3) results in the following: Because h q (j) and g m (s i ) are known functions, the parameters concerning b ij are γ qm .
Since the effect of weather is assumed to be smooth according to both j and s i , we use basis functions h q (j) and g m (s i ), which produce a smooth function, such as B-spline and the radial basis function (RBF).

Expression of µ ij
Since µ ij is the effect of past consumption, one can assume that µ ij is expressed as a linear combination of past consumption y (i−t−Lα)j , i.e., In practice, however, it is assumed that past consumption also depends on past weather, such as daily temperature.For example, suppose that it was exceptionally hot yesterday and it is cooler today.In this case, it is not desirable to directly use past consumption as the predictor; it is better to remove the effect of past temperature from past consumption.In other words, we can use y instead of y (i−t−Lα)j , and y i(j−u−L β ) , respectively.As a result, µ ij is expressed as follows: Substituting ( 7) into (5) results in the following: The appropriate values of α jt and β ju are chosen by several approaches.A simple method is α jt = 1/T and β ju = 1/U , which implies µ ij is the sample mean of the * For example, transactions of the day-ahead market in the JEPX close at 5:00 pm every day.For the forecast of the 5:30-6:00 pm interval tomorrow, we cannot use the information of today's consumption at the 5:30-6:00 pm interval due to the trading hours of the market, which implies L α = 1.

Proposed model
By combining the expressions of b ij in (5) and µ ij in (8), the model ( 2) is expressed as follows: The model ( 9) is equivalent to the linear regression model where γ = vec(Γ) and ε = vec(E) with E = (ε ij ).Here, X and ỹ are considered as the design matrix and the response vector, respectively.The definitions of ỹ and X are given in Appendix A.
Remark 2.1.Although this study considers a statistical model that can be applied to the electricity markets (i.e., 30-minute intervals), our proposed model is directly applicable to any time resolution of data; for example, the load in 1-minute intervals and temperature in 1-hour intervals.

Nonnegative least squares
To estimate the regression coefficient vector γ, one can use the least squares estimation In our experience, however, the elements of least squares estimates γ often become negative.In such cases, the estimate of b ij is negative because the basis functions h q (j) and The optimization problem ( 11) is a special case of quadratic programming with nonnegativity constraints (e.g., Franc et al., 2005).As a result, the NNLS problem becomes a convex optimization problem.Several efficient algorithms to obtain the solution in (11) have been proposed in the literature (e.g., Lawson and Hanson, 1995;Bro and DeJong, 1997;Timotheou, 2016).
We add the ridge penalty (Hoerl and Kennard, 1970) to the loss function of the NNLS estimation: where λ > 0 is a regularization parameter.In our experience, the ridge penalization yields a stable estimator and improves the forecast accuracy.

Forecast
For the day-ahead forecast, we forecast the loads on the next day, ŷ(i+1)j , for the given NNLS estimate γ and weather information s i+1 .The forecast value ŷ(i+1)j is expressed as follows: Here, b(i+1)j = M m=1 Q q=1 γqm h q (j)g m (s i+1 ) and μ(i+1)j = T t=1 α jt (y (i+1−t−Lα)j − b(i+1−t−Lα)j ).On the intraday forecast, we may use information about the loads on that day so that μ(i+1)j is expressed as Construction of a forecast interval based on (11) or ( 12) is not easy due to the constraints of the parameter.To derive the forecast interval, we employ a two-stage procedure; first, we estimate the parameter via NNLS to extract variables that correspond to nonzero coefficients.Then, we employ the least squares estimation based on the variables selected in the first step.With this procedure, we should derive the forecast interval after model selection.To achieve this result, the post-selection inference (Lee and Taylor, 2014;Lee et al., 2016) is employed.The post-selection inference for the NNLS estimation is detailed in Appendix B.

Case study 4.1 Data
The performance of our proposed procedure is investigated through analysis of electricity load data collected at a research facility.The dataset consists of electricity loads from January 4th, 2016, to December 31st, 2018.The loads are shown in kWh at 30minute intervals (i.e., J = 48).At certain times, loads are not observed due to electricity meter failures or blackouts.When some data values are missing for a particular day, the data for that day are removed, resulting in 1061 days of complete data.
We use maximum temperature around the research facility as weather variables s i .
We consider the problem of the day-ahead forecast: β ju ≡ 0. To eliminate the effect of the day of the week, the statistical models are constructed by day of the week; thus, we produce seven statistical models.All national holidays are regarded as Sundays, and we forecast the loads from January 2017 to December 2018 (the data in 2016 are only used for training).The training data consist of all load data up to the previous day of the forecast day; for example, when we forecast the loads on February 4th, 2017, the training data are loads from January 4th, 2016, to February 3rd, 2017.
The research facility has experimental equipment that uses large amounts of electricity, and the usage schedule of the experimental equipment is irregular.For this reason, we exclude the loads of the experimental equipment from the forecast loads of the research facility.

Setting
We compare the performance of our proposed methods based on various settings: • Two types of α jt : α jt = 1/T (hereafter referred to as "α1") and AR(1) structure (hereafter referred to as "α2").Details of the AR(1) structure are presented at the end of Section 2.2.
Then, µ m and h 2 m are estimated as follows: where #C m is the number of observations in cluster C m .The basis function h q (j) is constructed the same way as shown above, but the coefficient function β m (j) is not continuous around the boundary (e.g., j = 1, 48) with the ordinary RBF.To obtain a continuous function, we slightly modify the RBF function as follows: For each setting, we prepare 180 patterns of tuning parameters: Q = 5, 10, M = 5, 10, T = 4, ν g = 1, 4, 9, ν h = 1, 4, 9, and λ = 0, 10 −8 , 10 −6 , 10 −4 , 10 −2 .These tuning parameters are chosen so that the root mean squared error (RMSE) is minimized.
Table 1 shows a set of tuning parameters selected on the basis of the RMSE.For all settings, the ridge parameter λ is positive, which suggests that the ridge penalty may be helpful for improvement of the forecast accuracy.The optimal values of the tuning parameters depend on the statistical model.For example, by comparing S α2,E1,µ1 and S α2,E2,µ1 , we find that the NNLS estimation results in a more complex model than the LSE (M = 5 and M = 10 for S α2,E1,µ1 and S α2,E2,µ1 , respectively); due to the constraints on the parameters of the NNLS estimation, the NNLS estimation may require more parameters than the LSE to obtain good forecast accuracy.

Evaluation
We evaluate the performance of our proposed method through the monthly mean absolute percentage error (MAPE) The results are shown in Table 2.For all methods, the forecast error is relatively high.
This result occurs because the research facility is small and the electricity usage is unstable.As seen later, standard machine learning techniques also result in high forecast error.We observe that setting µ1 generally performs better than setting µ2 during seasonal changes, such as those during July 2017 and March and September 2018.The effect of past loads shown by model µ1 may appropriately remove the past temperature effect.
The performance of the NNLS estimation (E2) is comparable with that of the LSE (E1), which suggests that restricting the regression coefficients to all positive values may not cause a significant decrease in forecast accuracy.For setting µ1, the performance of α1 is similar to that of α2.On the other hand, for setting µ2, sometimes α1 results in a much worse performance than α2, especially during seasonal changes.The use of old loads, such as loads from 28 (=7•4) days ago, may decrease the forecast accuracy because the load patterns may change for several tens of days during seasonal changes.
In March 2018, the forecast errors are large due to seasonal changes.To investigate the performance in detail, we show the forecast values and actual values of loads from March 1st to 12th, 2018, in Figure 1.For setting µ2, the forecast values are often larger than the actual values; thus, setting µ2 cannot appropriately capture the effect of temperature.
Indeed, the past loads in February are generally larger than those in March, as shown in Figure 2; in Japan, it is much colder in February than in March.For example, to forecast the loads on March 5th, we use the past loads of the past four days of the same weekday: January 29th and February 5th, 19th, and 26th (February 12th was a national holiday in Japan and regarded as Sunday).The loads on the past four days are much larger than that on March 5th.The maximum temperatures on these five days (past four  3 to investigate the relationship between load and temperature.Table 3 shows that the temperature on March 5th is much higher than that on the past four days.Therefore, without removing the effect of temperature from past loads, the forecast on March 5th may not work accurately.Although the forecast accuracy of the LSE is similar to that of the NNLS estimation, the results of the decomposition ŷij = bij + μij are completely different.The LSE often results in negative values of bij , and then μij becomes much larger than the actual load.

Actual loads
Thus, interpreting the effect of weather is difficult with the LSE.This issue occurs because there are no restrictions on the sign of bij .On the other hand, the effect of weather is appropriately captured by the NNLS estimation.For example, on March 5th, the effect of the weather is small due to the moderate temperature on that day; on the other hand, the LSE results in large negative values for bij .The constraint on nonnegativeness of γ greatly improves the interpretation of the estimated model.
To further investigate the effect of temperature, we depict bij when maximum temperatures are 5 • C (cold day), 20 • C (cool day), and 35 • C (hot day), which is shown in Figure 4.Because we estimate the parameter for each day of the week separately, the weather effects b ij differ by the day of the week.We depict bij on Tuesday and Sunday.
The parameter γ is estimated by using the loads from January 4th, 2016, to December 31st, 2018.The estimation procedure uses setting S α2,E2,µ1 and "separate regression",   where the regression is conducted for each time interval separately.The results of our proposed procedure show that bij is highly dependent on the temperature: the weather 5℃ 20℃ 35℃ •

Regression on each time interval separately
Proposed method          effect may be large for cold and hot days due to air conditioner use.Moreover, the weather effect on Tuesday is generally more significant than that on Sunday because Tuesday is a working day while Sunday is a weekend day at this research facility.When we conduct the regression analysis for each time interval separately, the daily curve of bij becomes non-smooth.As a result, our proposed method is more appropriate for interpreting the weather effect than the separate regression.Note that even if we increase the value of the ridge parameter, the weather effect of the separate regression remains non-smooth in our experience.Therefore, a smooth function for regression coefficients is essential for producing a smooth curve for bij .

Comparison with standard machine learning techniques
We compare the performance of our proposed method with the following popular machine learning techniques: random forest, support vector regression (SVR), and lasso.
The predictors are electricity loads of the past T = 4 days and maximum temperature.
We use R packages randomForest, ksvm, and glmnet to implement these machine learning techniques.In SVR, we use the Gaussian Kernel.The SVR includes several tuning parameters; σ is used in the Gaussian Kernel, C corresponds to the regularization parameter in the SVR problem, and is used in the regression.The data in -tube around the prediction value is not penalized.We prepare the candidates σ = 0.001, 0.01, 0.1, C = 1, 10, 100 and = 0.001, 0.01, 0.1.The tuning parameters of random forest include the number of trees n trees and the number of variables sampled at each split, say n var .The candidates of these parameters are n trees = 50, 100, 500 and n var = 1, 2, 3.In the lasso, the candidates of regularization parameter λ are set as follows: maximum and minimum values of λ are defined by λ max = 200 and λ min = 0.01, respectively, and 100 grids are constructed on a log scale.We choose tuning parameters that yield the smallest value of RMSE in (13).
Table 4 shows the MAPE given by ( 14) for our proposed methods and machine learning techniques.Among all methods, S α2,E1,µ1 yields the best performance in terms of MAPE.S α2,E2,µ1 performs slightly worse than but comparable with S α2,E1,µ1 .If an interpretation of the estimated model is needed, S α2,E2,µ1 is better than S α2,E1,µ1 .Both S α2,E1,µ1 and S α2,E2,µ1 perform slightly better than the machine learning techniques.The forecast accuracy of the machine learning techniques might be improved when we include more information about weather and past loads (e.g., Khoshrou and Pauwels, 2019).However, even if the forecast accuracy is improved, the standard machine learning techniques cannot provide a smooth curve for bij .

Concluding remarks
We have constructed a statistical model for forecasting future electricity loads.The key idea of our proposed model is decomposition of the load expressed as where b ij and µ ij are the effects of weather and past loads, respectively.To capture the nonlinear effect of weather information, we employed the varying coefficient model.
Numerical results showed that our method appropriately captured the weather effect and performed slightly better than the standard machine learning techniques.With the ordinary least squares estimation (LSE), the estimate of b ij , say bij , became negative because some of the elements of regression coefficients γ were negative.The negative weather effect caused interpretation of the estimated model to be difficult.To address this issue, we employed the NNLS estimation; this estimation is performed under the constraint that all of the elements of γ are nonnegative.The numerical result showed that the NNLS estimation produced a more interpretable model than the LSE.Estimating the amount of electricity fluctuations caused by weather in Figure 4 would be helpful for making strategies for energy-saving intervention and demand response.
The proposed method is carried out under the assumption that the errors are uncorrelated.In practice, however, the errors among near time intervals may be correlated.
As a future research topic, it would be interesting to assume a correlation among time intervals and estimate a regression model that includes the correlation parameter.
where T and U are positive integers, which denote how far we trace back through the data and α jt (t = 1, ..., T ) and β ju (u = 1, ..., U ) are positive values given beforehand.Here, L α and L β are nonnegative integers that describe the lags; these values change according to the closing time of transactions * .The regression coefficients α jt correspond to the effects of past consumptions for the same time interval on previous days, while β jt are the coefficients for different time intervals on the same day.For the day-ahead market, we assume that β ju ≡ 0.
g m (s i ) are generally positive values.When b ij < 0, the values of µ ij become extremely large to adjust for the negative b ij value, which makes interpreting the estimated model difficult.Indeed, y ij ≈ µ ij + b ij means the current consumption is decomposed by the effect of past consumption and that of weather.The interpretation may be realized only when b ij is a nonnegative value; in that case, the value of b ij implies how the weather affects the loads.A nonnegative value of the weather effect b ij is realized when all elements of γ are nonnegative.The nonnegative least squares (NNLS) estimation is useful for estimating nonnegative regression coefficients:

Figure 1 :Figure 2 :
Figure 1: Forecast values and actual values of loads from March 1st to 12th, 2018. loads

Figure 4 :
Figure 4: The weather effect b ij on Tuesday and Sunday.The parameter γ is estimated by using the dataset from January 4th, 2016, to December 31st, 2018.The estimation procedure uses setting S α2,E2,µ1 (upper side) and "separate regression", where the regression is conducted for each time interval separately (lower side).

Table 3 :
Temperatures on several dates in 2018.jt y (i−t)j may not appropriately capture the effect of temperature.The poor performance on March 5th by model µ2 is attributed to the above fact.On the other hand, for model µ1, bij may appropriately capture the effect of temperature.