Interpretable Modeling for Short- and Medium-Term Electricity Demand Forecasting

Hirose, Kei

doi:10.3389/fenrg.2021.724780

ORIGINAL RESEARCH article

Front. Energy Res., 14 December 2021

Sec. Smart Grids

Volume 9 - 2021 | https://doi.org/10.3389/fenrg.2021.724780

This article is part of the Research TopicData-Driven Energy Demand Side Management TechniquesView all 6 articles

Interpretable Modeling for Short- and Medium-Term Electricity Demand Forecasting

Kei Hirose^1,2*

¹Institute of Mathematics for Industry, Kyushu University, Fukuoka, Japan
²RIKEN Center for Advanced Intelligence Project, Tokyo, Japan

We consider the problem of short- and medium-term electricity demand forecasting by using past demand and daily weather forecast information. Conventionally, many researchers have directly applied regression analysis. However, interpreting the effect of weather on the demand is difficult with the existing methods. In this study, we build a statistical model that resolves this interpretation issue. A varying coefficient model with basis expansion is used to capture the nonlinear structure of the weather effect. This approach results in an interpretable model when the regression coefficients are nonnegative. To estimate the nonnegative regression coefficients, we employ nonnegative least squares. Three real data analyses show the practicality of our proposed statistical modeling. Two of them demonstrate good forecast accuracy and interpretability of our proposed method. In the third example, we investigate the effect of COVID-19 on electricity demand. The interpretation would help make strategies for energy-saving interventions and demand response.

1 Introduction

Short- and medium-term demand forecasting with high accuracy is essential for decision making during the trade on electricity markets and operation of power systems. Conventionally, several researchers have used previous demand, weather, and other factors as exploratory variables (e.g., Lusis et al., 2017) and directly applied regression analyses to forecast demand. As methodologies for regression analysis, linear regression (Amral et al., 2008; Dudek, 2016; Saber and Alam, 2017) and smoothing spline (Engle et al., 1986; Harvey and Koopman, 1993) have been traditionally used. Recently, several studies have applied functional data analysis, where the daily curves of electricity demand are expressed as functions (Cabrera and Schulz, 2017; Vilar et al., 2018). It should be noted that most statistical approaches are based on probabilistic forecasts, and the distribution of forecast values is helpful for risk management (Cabrera and Schulz, 2017). On the other hand, machine learning techniques have attracted attention in recent years, such as support vector machine (SVM); (Chen et al., 2017; Jiang et al., 2018; Yang et al., 2019); neural networks (He, 2017; Guo et al., 2018b; Kong et al., 2018; Bedi and Toshniwal, 2019; Wang et al., 2019), gradient boosting (Zhang et al., 2019), and hybrids of multiple forecasting techniques (Miswan et al., 2016; Liu et al., 2017; de Oliveira and Cyrino Oliveira, 2018; Haq and Ni, 2019). These techniques capture complex nonlinear structures; therefore, high forecast accuracies are expected.

In practice, the time intervals are different among exploratory variables. For example, assume that the demand is forecasted for a single day that will occur several days in the future, at 30-min intervals; this is a common scenario for market transactions in electricity exchanges (e.g., the day-ahead market in the European Power Exchange, EPEX). In this example, the electricity demand would be collected at 30-min intervals using a smart meter, whereas weather forecast information, such as maximum temperature and average humidity, would typically be observed at intervals of one day. In this study, we use past demand at 30-min or 1-h intervals and daily information (e.g., maximum temperature) on the forecast day as exploratory variables. We note that our proposed model, which will be described in Section 2, is directly applicable to any time resolution of data, such as the demand in 1-min intervals and temperature forecast in 1-h intervals.

From a suppliers’ point of view, it is crucial to produce an interpretable model to investigate the impact of weather on the demand. For example, estimating the fluctuations of electric power caused by weather would help develop strategies for energy-saving interventions [e.g., (Guo et al., 2018a; Wang et al., 2018)] and demand response [e.g., (Ruiz-Abellón et al., 2020)]. To produce an interpretable model, one can directly add weather forecast information to the exploratory variables in the regression model (Hong et al., 2010) and investigate the estimator of regression coefficients. More generally, techniques for interpreting any type of black-box model, including the deep neural networks, have been recently proposed, such as the Local Interpretable Model-agnostic Explanations (LIME; (Ribeiro et al., 2016)) and SHapley Additive exPlanations (SHAP; (Lundberg and Lee, 2017)). However, these methods are used for variable selection, i.e., a set of variables that plays an essential role in the forecast is selected. Variable selection cannot estimate the fluctuations in electric power caused by weather.

In contrast to variable selection, decomposition of the electricity demand at time t, say y_t, into two parts is useful for interpretation:

y_{t} \approx μ_{t} + b_{t}, (1)

where μ_t and b_t are the effects of past demand and weather forecast information, respectively. Typically, we use demand at the same time interval of the previous days as exploratory variables [e.g., (Lusis et al., 2017)], and regression analysis is separately conducted on each time interval. We then construct estimators of μ_t and b_t, say ${\hat{μ}}_{t}$ and ${\hat{b}}_{t}$ , respectively. The interpretation is carried out by plotting a curve of ${\hat{b}}_{t}$ . However, without elaborate construction and estimation of b_t, we face two issues.

The first issue is that the curve of ${\hat{b}}_{t}$ often becomes non-smooth at any time interval (i.e., every 30 min) in our experience. The non-smooth daily curve of b_t is unrealistic because it implies the impact of daily weather forecast on demand changes non-smoothly every 30 min. This problem is caused by the fact that the regression analysis is separately conducted at each time interval. To address this issue, we should estimate parameters under the assumption that b_t is smooth.

The second issue is related to the parameter estimation procedure. In many cases, the regression coefficients are estimated through the least squares method. However, in our experience, the estimate of regression coefficients related to b_t can be negative, leading to a negative value of ${\hat{b}}_{t}$ . Since ${\hat{b}}_{t}$ is the fluctuations of electric power caused by weather, the interpretation becomes unclear. To alleviate this problem, we need to restrict the regression coefficients associated with b_t to nonnegative values.

In this study, we develop a statistical model that elaborately captures the nonlinear structure of daily weather information to address two challenges, as mentioned earlier. We employ the varying coefficient model (Hastie and Tibshirani, 1993; Fan and Zhang, 1999) with basis expansion, where the regression coefficients associated with weather are assumed to be different depending on the time intervals. The regression coefficients are expressed by a nonlinear smooth function with basis expansion, which allows us to generate a smooth function of ${\hat{b}}_{t}$ . Furthermore, the weather effect ${\hat{b}}_{t}$ is also expressed as a nonlinear smooth function. To generate nonnegative regression coefficients, we employ the nonnegative least squares (NNLS, e.g., Lawson and Hanson, 1995) estimation. NNLS estimates parameters under the constraint that all regression coefficients are nonnegative. With the NNLS estimation, the value of ${\hat{b}}_{t}$ is always nonnegative; thus, the interpretation becomes clear.

The usefulness of our proposed method is investigated through the application to three real datasets. The results for two of the datasets show that the NNLS can appropriately capture the fluctuations of electric power caused by weather. Furthermore, our proposed method yields better forecast accuracy than the existing machine learning techniques. In the third example, we investigate our proposed method’s practical usage when COVID-19 influences the electrical demand (e.g., facility closure or recommendation of telework). Our proposed method is directly applicable in such a situation; in addition to the daily weather forecast, we use the average number of infections in the past several days as exploratory variables. The result shows that our proposed method can adequately capture the effect of COVID-19 and also improve forecast accuracy.

The remainder of this paper is organized as follows: Section 2 describes our proposed model based on the varying coefficient model. In Section 3, we present the parameter estimation via nonnegative least squares. Section 4 presents the analysis of data from Tokyo Electric Power Company Holdings. In Section 5, we investigate the impact of COVID-19 on electrical demand through the analysis of data from one selected research facility in Japan. Concluding remarks are given in Section 6. Some technical proofs and additional information of the data analysis are deferred to the Supplementary Material.

2 Proposed Model

Short- and medium-term forecasting is often used for trading electricity in the market. Among various electricity markets, the day-ahead (or spot) and the intraday markets are popular in electricity exchanges, including the European Power Exchange (EPEX) (https://www.epexspot.com/en/market-data/dayaheadauction) and Japan Electric Power Exchange (JEPX) (http://www.jepx.org/english/index.html). In the day-ahead market, contracts for the delivery of electricity on the following day are made. In the intraday market, the power will be delivered several tens of minutes (e.g., 1 h in JEPX) after the order is closed. In both markets, transactions are typically made in 30-min intervals; thus, the suppliers must forecast the demand in 30-min intervals. In this study, we consider the problem of forecasting demand that can be applied to both day-ahead and intraday markets.

Let y_ij be the electricity demand at jth time interval on ith date (i = 1, … , n, j = 1, … , J). Typically, J = 48, because we usually forecast the demand in 30-min intervals. We consider the following model:

y_{i j} = μ_{i j} + b_{i j} + ε_{i j}, (2)

where μ_ij is the effect of past electricity demand, b_ij is the effect of weather, such as temperature and humidity, and ɛ_ij are error terms with E[ɛ_ij] = 0.

Typically, the error variances in the daytime are larger than those at midnight because of the uncertainty of human behavior in the daytime. Therefore, it would be reasonable to assume that $V [ε_{i j}] = σ_{j}^{2}$ , i.e., the error variances depend on the time interval. One may assume the correlation of errors for different time intervals, i.e., Cor(ɛ_ij, ɛ_ij′) ≠ 0 for some j ≠ j′; however, the number of parameters becomes large. For this reason, we consider only the case where the errors are uncorrelated. Note that the final implementation of our proposed procedure described later is independent of the assumption of the correlation structure in errors.

One can express b_ij and μ_ij as linear or nonlinear functions of predictors and conduct the linear regression analysis. With this procedure, however, we face two issues, as mentioned in the introduction; thus, we carefully construct appropriate functions of b_ij and μ_ij.

2.1 Expression of b_ij

Weather forecast information is typically observed at intervals of one day and not 30 min (e.g., the maximum temperature or average humidity). For this reason, we assume that the weather forecast information does not depend on j. Let a vector of weather information be s_i. We express b_ij as a function of s_i. Here, we assume two structures as follows:

• It is well known that the relationship between weather variables and consumption is nonlinear. For example, the relationship between maximum temperature and consumption is approximated by a quadratic function (e.g., 30) because air conditioners are used on both hot and cold days. For this reason, it is assumed that b_ij is expressed as some nonlinear function of s_i.

• Although s_i does not depend on j, the effect of weather, b_ij, may depend on j. For example, consumption in the daytime is affected by the maximum temperature more than that at midnight. In this case, the regression coefficients associated with s_i change according to the time interval j. However, if we assume different parameters at each time interval, the number of parameters can be large, resulting in poor forecast accuracy. To decrease the number of parameters, we use the varying coefficient model, in which the coefficients are expressed as a smooth function of the time interval.

Under the above considerations, we propose expressing b_ij as follows:

b_{i j} = \sum_{m = 1}^{M} β_{m} (j) g_{m} (s_{i}), (3)

where g_m(s_i) (m = 1, … , M) are basis functions given beforehand, β_m(j) are functions of regression coefficients, and M is the number of basis functions.

We also use the basis expansion for β_m(j):

β_{m} (j) = \sum_{q = 1}^{Q} γ_{q m} h_{q} (j), (4)

where h_q(j) (q = 1, … , Q) are basis functions given beforehand and γ_qm are the elements of the coefficient matrix Γ = (γ_qm). Substituting (4) into (3) results in the following:

b_{i j} = \sum_{m = 1}^{M} \sum_{q = 1}^{Q} γ_{q m} h_{q} (j) g_{m} (s_{i}) . (5)

Because h_q(j) and g_m(s_i) are known functions, the parameters concerning b_ij are γ_qm.

Since the effect of weather is assumed to be smooth according to both j and s_i, we use basis functions h_q(j) and g_m(s_i), which produce a smooth function, such as B-spline and the radial basis function (RBF). Our numerical study adopts the B-spline and cyclic B-spline functions as basis functions of g_m(s_i) and h_q(j), respectively. For detail, please refer to Section 4.1. The RBF could be used but it includes a tuning parameter that highly affects the forecast error, resulting in low forecast accuracy in our experience.

2.2 Expression of μ_ij

Since μ_ij is the effect of past consumption, one can assume that μ_ij is expressed as a linear combination of past consumption $y_{(i - t - L_{α}) j}$ , i.e.,

μ_{i j} = \sum_{t = 1}^{T} α_{j t} y_{(i - t - L_{α}) j} + \sum_{u = 1}^{U} β_{j u} y_{i (j - u - L_{β})}, (6)

where T and U are positive integers, which denote how far we trace back through the data and α_jt (t = 1, … , T) and β_ju (u = 1, … , U) are positive values given beforehand. The α_jt correspond to the effects of past consumptions for the same time interval on previous days, while β_jt are the coefficients for different time intervals on the same day. For the day-ahead market, we assume that β_ju ≡ 0. Here, L_α and L_β are nonnegative integers that describe the lags; these values change according to the closing time of transactions. For example, transactions of the day-ahead market in the JEPX close at 17:00 on the previous day of the delivery of electricity. Thus, for instance, when the trade of the electricity demand is in the 17:30–18:00 interval on Sep 20th, we cannot use the electricity demand in the same time interval on Sep 19th due to the trading hours of the market, resulting in L_α = 1.

The design of T and U depends on the aim of the analysis. For example, if the analyst should use only the latest demand, T and U are selected as (T, U) = (1, 1). The usage of only one latest demand value could often yield unstable forecast value; in this case, larger T and U would provide better results. However, too large T and U can result in overfitting. Thus, we do not generally know the best T and U to achieve the best forecast accuracy. In our numerical experiments in Sections 4 and 5, all tuning parameters, including T and U, are selected by cross-validation. In our experience, the cross-validation generally provides good forecast accuracy.

In practice, however, it is assumed that past consumption also depends on past weather, such as daily temperature. For example, suppose that it was exceptionally hot yesterday and it is cooler today. In this case, it is not desirable to directly use past consumption as the predictor; it is better to remove the effect of past temperature from past consumption. In other words, we can use $y_{(i - t - L_{α}) j} - b_{(i - t - L_{α}) j}$ and $y_{i (j - u - L_{β})} - b_{i (j - u - L_{β})}$ instead of $y_{(i - t - L_{α}) j}$ , and $y_{i (j - u - L_{β})}$ , respectively. As a result, μ_ij is expressed as follows:

μ_{i j} = \sum_{t = 1}^{T} α_{j t} (y_{(i - t - L_{α}) j} - b_{(i - t - L_{α}) j}) + \sum_{u = 1}^{U} β_{j u} (y_{i (j - u - L_{β})} - b_{i (j - u - L_{β})}) . (7)

Substituting (5) into (7) results in the following:

\begin{matrix} μ_{i j} & = & \sum_{t = 1}^{T} α_{j t} y_{(i - t - L_{α}) j} - \sum_{t = 1}^{T} \sum_{m = 1}^{M} \sum_{q = 1}^{Q} α_{j t} γ_{q m} h_{q} (j) g_{m} (s_{i - t - L_{α}}) \\ + \sum_{u = 1}^{U} β_{j u} y_{i (j - u - L_{β})} - \sum_{u = 1}^{U} \sum_{m = 1}^{M} \sum_{q = 1}^{Q} β_{j u} γ_{q m} h_{q} (j - u - L_{β}) g_{m} (s_{i}) . \end{matrix} (8)

The appropriate values of α_jt and β_ju are chosen by several approaches. A simple method is α_jt = 1/T and β_ju = 1/U, which implies μ_ij is the sample mean of the past consumption. Another method is based on the AR(1) structure, i.e., $α_{j t} = ρ_{α}^{t}$ and $β_{j u} = ρ_{β}^{u}$ , where ρ_α and ρ_β satisfy $\sum_{t = 1}^{T} ρ_{α}^{t} = 1$ and $\sum_{u = 1}^{U} ρ_{β}^{u} = 1$ , respectively. Note that $\sum_{t = 1}^{T} ρ^{t} = ρ (1 - ρ^{T}) / (1 - ρ)$ , so $\sum_{t = 1}^{T} ρ^{t} = 1$ is equivalent to ρ^T+1 − 2ρ + 1 = 0, whose numerical solution is easily obtained.

2.3 Proposed Model

By combining the expressions of b_ij in Eq. 5 and μ_ij in Eq. 8, the model Eq. 2 is expressed as follows:

\begin{matrix} y_{i j} & = & \frac{1}{T} \sum_{t = 1}^{T} y_{(i - t - L_{α}) j} - \frac{1}{T} \sum_{t = 1}^{T} \sum_{m = 1}^{M} \sum_{q = 1}^{Q} γ_{q m} h_{q} (j) g_{m} (s_{i - t - L_{α}}) \end{matrix}

\begin{matrix} + \frac{1}{U} \sum_{u = 1}^{U} y_{i (j - u - L_{β})} - \frac{1}{U} \sum_{u = 1}^{U} \sum_{m = 1}^{M} \sum_{q = 1}^{Q} γ_{q m} h_{q} (j - u - L_{β}) g_{m} (s_{i}) \end{matrix}

\begin{matrix} + \sum_{m = 1}^{M} \sum_{q = 1}^{Q} γ_{q m} h_{q} (j) g_{m} (s_{i}) + ε_{i j} . \end{matrix} (9)

The model Eq. 9 is equivalent to the linear regression model

\tilde{y} = X γ + ε, (10)

where γ = vec(Γ) and ɛ = vec(E) with E = (ɛ_ij). Here, X and $\tilde{y}$ are considered as the design matrix and the response vector, respectively. The definitions of $\tilde{y}$ and X are given in Section S1 in the Supplementary Material.

3 Estimation

3.1 Nonnegative Least Squares

To estimate the regression coefficient vector γ, one can use the least squares estimation (LSE)

\min_{γ} ‖ \tilde{y} - X γ ‖_{2}^{2} .

In our experience, however, the elements of least squares estimate $\hat{γ}$ often become negative. In such cases, the estimate of b_ij is negative because the basis functions h_q(j) and g_m(s_i) generally take positive values. When b_ij < 0, one can interpret the weather effect is negative. Nevertheless, the one-day curve of ${\hat{b}}_{i j}$ turns out to be counterintuitive; the effect of weather negatively increases as the electricity demand increases. In other words, b_ij is negatively large at working hours and small at midnight. As a result, μ_ij becomes extremely large in the working time. We observe this phenomenon on the analysis of three datasets presented in this study; one of them is the well-known Global Energy Forecasting Competition 2014 (GEFCom 2014) data. Therefore, this phenomenon could occur in other datasets.

A clear interpretation is realized when the weather effect b_ij is nonnegative. To achieve this, we employ the nonnegative least squares (NNLS) estimation, in which we minimize the loss function under a constraint the regression coefficients are nonnegative:

\min_{γ} ‖ \tilde{y} - X γ ‖_{2}^{2} s u b j e c t t o γ \geq 0 . (11)

The optimization problem Eq. 11 is a special case of quadratic programming with nonnegativity constraints [e.g., (Franc et al., 2005)]. As a result, the NNLS problem becomes a convex optimization problem. Several efficient algorithms to obtain the solution in Eq. 11 have been proposed in the literature [e.g., (Lawson and Hanson, 1995; Bro and De Jong, 1997; Timotheou, 2016)].

We add the ridge penalty (Hoerl and Kennard, 1970) to the loss function of the NNLS estimation:

\min_{γ} ‖ \tilde{y} - X γ ‖_{2}^{2} + λ ‖ γ ‖_{2}^{2} s u b j e c t t o γ \geq 0, (12)

where λ > 0 is a regularization parameter. In our experience, the ridge penalization improves the forecast accuracy, and also produces a smoother function of ${\hat{b}}_{i j}$ than the unpenalized NNLS, which makes the interpretation of the weather effect easier.

3.2 Forecast

For the day-ahead forecast, we forecast the demand on the next day, ${\hat{y}}_{(i + 1) j}$ , for the given NNLS estimate $\hat{γ}$ and weather information s_i+1. The forecast value ${\hat{y}}_{(i + 1) j}$ is expressed as follows:

{\hat{y}}_{(i + 1) j} = {\hat{μ}}_{(i + 1) j} + {\hat{b}}_{(i + 1) j} .

Here, ${\hat{b}}_{(i + 1) j} = \sum_{m = 1}^{M} \sum_{q = 1}^{Q} {\hat{γ}}_{q m} h_{q} (j) g_{m} (s_{i + 1})$ and ${\hat{μ}}_{(i + 1) j} = \sum_{t = 1}^{T} α_{j t} (y_{(i + 1 - t - L_{α}) j} - {\hat{b}}_{(i + 1 - t - L_{α}) j})$ . On the intraday forecast, we may use information about the demand on that day so that ${\hat{μ}}_{(i + 1) j}$ is expressed as

{\hat{μ}}_{(i + 1) j} = \sum_{t = 1}^{T} α_{j t} (y_{(i + 1 - t - L_{α}) j} - {\hat{b}}_{(i + 1 - t - L_{α}) j}) + \sum_{u = 1}^{U} β_{j u} (y_{(i + 1) (j - u - L_{β})} - {\hat{b}}_{(i + 1) (j - u - L_{β})}) .

Construction of a forecast interval based on Eq. 11 or Eq. 12 is not easy due to the constraints of the parameter. To derive the forecast interval, we employ a two-stage procedure; first, we estimate the parameter via NNLS to extract variables that correspond to nonzero coefficients. Then, we employ the least squares estimation based on the variables selected in the first step. With this procedure, we should derive the forecast interval after model selection. To achieve this result, the post-selection inference (Lee and Taylor, 2014; Lee et al., 2016) is employed. The post-selection inference for the NNLS estimation is detailed in Section S2 in the Supplementary Material.

4 Application to Demand Data From Tokyo Electric Power Company Holdings

The performance of our proposed method is investigated through the analysis of electricity demand data collected from Tokyo Electric Power Company Holdings, available at https://www.tepco.co.jp/en/forecast/html/download-e.html. The dataset consists of electricity demand from April 1st, 2016, to March 30th, 2020. The demand is shown in MW at 1-h intervals (i.e., J = 24).

We forecast the demand from April 1st, 2019 to March 30th, 2020 (data in 2016–2018 are only used for training). The training data consist of all demand data up to the previous day of the forecast day; for example, when we forecast the demand on February 4th, 2020, the training data are demand from April 1st, 2016, to February 3rd, 2020. For the sake of simplicity, we consider the problem of the day-ahead forecast, that is, β_ju ≡ 0. Also, we set L_α = 0.

4.1 Basis Functions

We use maximum temperature in Tokyo as the weather variable s_i, available at Japan Meteorological Agency (https://www.jma.go.jp/jma/indexe.html).

Figure 1 shows the relationship between the maximum temperature and demand. We depict this relationship in different time intervals for Sunday, Monday, and Wednesday. The curves are depicted by fitting the ordinary least squares estimation with the cubic B-spline function (e.g., Hastie et al., 2008) with equally-spaced knots. The number of basis functions is M = 5. The B-spline function of k-degree is constructed by linear combination of M basis functions with M + 2k knots, say t₁, … , t_M+2k. For given n data points in increasing order, say x₁, … , x_n, we set the knots as t₁ ≤ t₂ ≤ ⋯ ≤ t_k = x₁ ≤ ⋯ ≤ x_n = t_M+k+1 ≤ ⋯ ≤ t_M+2k. The B-spline function is the density function of a convolution of uniform random variables (Hastie et al., 2008), and it is constructed by the following recurrence relation (De Boor, 1972):

\begin{array}{l} B_{j} (x; 0) & = 1_{[t_{j}, t_{j + 1})} (x), \\ B_{j} (x; m) & = \frac{x - t_{j}}{t_{j + m} - t_{j}} B_{j, m - 1} (x) + \frac{t_{j + m + 1} - x}{t_{j + m + 1} - t_{j + 1}} B_{j + 1, m - 1} (x), \end{array}

where $1_{A} (x)$ is the indicator function, that is, $1_{A} (x) = 1$ when x ∈ A and $1_{A} (x) = 0$ otherwise. We adopt the cubic B-spline (k = 3). The left panel of Figure 2 depicts the cubic B-spline function with M = 10. For all settings, the curve-fitting works well, which implies the usage of the B-spline function as a basis function g(s_i) would be reasonable.

FIGURE 1

FIGURE 1. Relationship between maximum temperature and demand for different time intervals on Sunday, Monday, and Wednesday. The curves are depicted by fitting the ordinary least squares estimation with the cubic B-spline function.

FIGURE 2

FIGURE 2. B-spline functions of 3 degree (left panel) and cyclic B-spline functions of 3 degree (right panel) with M = 10.

The curve shape for 9:00–10:00 is different from that for 15:00–16:00, which suggests that the regression coefficients must be different among time intervals. Meanwhile, the curve shapes for 15:00–16:00 and 18:00–19:00 are similar. In this case, it is reasonable to assume that the regression coefficients for 15:00–16:00 are similar to those for 18:00–19:00. The regression coefficients are assumed to change depending on the time interval yet to be smooth according to the time interval. To achieve a smooth function of β_m(j), we also use the cubic B-spline as a basis function of h_q(j). As the ordinary B-spline function cannot produce a smooth curve around the boundary (i.e., 23:00–0:00 and 0:00–1:00), we employ the cyclic B-spline function, where the basis functions wrap at the first and last knot locations. The cyclic B-spline function is implemented in the cSplineDes function in the mgcv package in R. The right panel of Figure 2 shows the cyclic B-spline function of degree 3 with M = 10.

We also observe that the curve shapes are different among each day of the week. Therefore, we construct the statistical models by day of the week separately; seven statistical models are constructed. To forecast the demand, we select a model that matches the day of the week. All national holidays are regarded as Sunday; therefore, the number of observations on Sunday is larger than on other weekdays.

4.2 Candidates of Tuning Parameters

We employ our proposed method based on two estimation procedures: ridge estimation

\min_{γ} ‖ \tilde{y} - X γ ‖_{2}^{2} + λ ‖ γ ‖_{2}^{2},

and NNLS estimation with the ridge penalty in Eq. 12. We label these estimation procedures as “LSE” and “NNLS,” respectively. For both LSE and NNLS, we prepare a wide variety of statistical models by changing the tuning parameters. Here, we review the role of each tuning parameter and present their candidates as follows:

• Q: the number of basis functions in the varying coefficient model. As Q increases, the electricity fluctuations caused by weather becomes large in time interval j (j = 1, … , J). As we will show in Section 4.3, the value of Q may not highly affect the forecast accuracy. Therefore, it would be sufficient to set the candidates of Q as Q = 5, 10.

• M: the number of basis functions for the impact of weather on demand. As M increases, the electricity fluctuations caused by weather becomes large in temperature s_i. As shown in Figure 1, M = 5 would be sufficient to express the relationship between weather and demand. Therefore, the candidates of M are M = 5, 10.

• T: the number of past demands for forecasting. In other words, we use demand in the past T days to forecast demand. In this numerical analysis, we construct the statistical models by day of the week separately. Therefore, the forecast is done by using the past T weeks on the same day of the week. Consequently, it would be sufficient to set the candidates of T as T = 2, 4.

• λ: regularization parameter for ridge regression. As λ increases, the regression coefficients become stable. Small λ prevents the overfitting, but too large λ leads to large bias. The candidates of λ are 20 sequences from 10^–5 to 1.0 on a log-scale. We note that the log-scale sequence is often used to set the candidates of the regularization parameters (e.g., Friedman et al., 2010). We also set λ = 0 to investigate the impact of the ridge parameter on the forecast accuracy. As we will see in Section 4.3, it would be sufficient to set the above candidates of λ to achieve good accuracy.

In addition to the above candidates of models, we consider two types of α_jt: sample mean of the past demand (i.e., α_jt = 1/T) and AR(1) structure. Details of the AR(1) structure are presented at the end of Section 2.2. As a result, the total number of candidates of the models is 336 (= 2 × 2 × 2 × (20 + 1) × 2). These candidates are summarized in Table 1.

TABLE 1

TABLE 1. Candidates of tuning parameters for our proposed method.

4.3 Impact of Tuning Parameters on Forecast Accuracy

The forecast accuracy of our proposed method depends on the tuning parameters presented above. We investigate the impact of tuning parameters on the mean absolute percentage error (MAPE) defined as

M A P E = \frac{1}{J n} \sum_{i = 1}^{n} \sum_{j = 1}^{J} \frac{| y_{i j} - {\hat{y}}_{i j} |}{y_{i j}} . (13)

Figure 3 shows the relationship between regularization parameter λ and MAPE. The relationships are investigated for all combinations of (Q, M, T).

FIGURE 3

FIGURE 3. Impact of regularization parameter λ on the MAPE, investigated for all combinations of (Q, M, T). The dashed line indicates α_jt = 1/T and the solid line corresponds to the AR(1) structure.

The result shows that the AR(1) structure on α_jt produces smaller MAPE than the mean structure (α_jt = 1/T) for all candidates of ridge parameter λ. For AR(1) model, the performance for T = 4 is better than that for T = 2. We observe that a large value of λ may result in poor forecast accuracy; meanwhile, a small amount of λ generally results in good accuracy. The result for λ = 0 is not displayed here because the MAPE can be extremely large due to the non-convergence of parameters; the ridge penalty helps avoid such a non-convergence. In summary, the AR(1) structure on α_jt, T = 4, and a small value of λ will result in a small MAPE for this dataset.

4.4 Forecast Accuracy

We compare the performance of our proposed method with the following popular machine learning techniques: random forest (RF), support vector machine (SVM), Least absolute shrinkage and selection operator (Lasso), LightGBM (LGBM; (Ke et al., 2017)), and Long Short Term Memory [LSTM; (Hochreiter and Schmidhuber, 1997; Van Houdt et al., 2020)]. The RF, SVM, LGBM, and LSTM involve nonlinear functions, allowing us to capture complex structures. We use R packages randomForest, ksvm, glmnet, and lgbm to implement RF, SVM, Lasso, and LGBM, respectively. The LSTM is implemented by tensorflow.keras library in python. The LGBM is based on the gradient boosting decision tree and achieves good forecast accuracy in various fields of research, including energy (e.g., (Ju et al., 2019)). The LSTM is the state-of-the-art technique for analyzing time series data using artificial neural networks (ANN). For these machine learning techniques, the forecast is made by the electricity demand of past T days and maximum temperature, that is, ${\hat{y}}_{i j} = f (s_{i}, y_{(i - 1) j}, \dots, y_{(i - T + 1) j})$ .

The characteristics of our proposed method and the machine learning techniques are compared in Table 2.

TABLE 2

TABLE 2. Characteristics of our proposed method (NNLS) and the state-of-the-art machine learning techniques in terms of forecast accuracy, interpretation, and computational speed.

A detailed description of Table 2 is as follows:

• With small sample sizes, the machine learning methods that capture nonlinear structure provide good accuracy except for the LSTM; in general, the ANN can often yield poor accuracy with small sample sizes (e.g., Tange et al., 2017). The accuracy of the lasso is not sufficiently but relatively high because the lasso cannot capture the nonlinear structure. When the number of observations is large, the LGBM and LSTM will perform perfectly well because these methods capture complex nonlinear structures. However, the sample sizes of the dataset used in this Section (and also other Sections) may not be large.

• Our proposed method is interpretable thanks to the nonlinear regression modeling with basis expansions. The lasso may also be interpretable, but it only captures the linear structure. The other machine learning techniques might not be suitable for interpretation purposes; please refer to Section 4.5 for the numerical results.

• The computational speed of our proposed method turns out to be fast thanks to the efficient NNLS algorithm of Lawson and Hanson (Lawson and Hanson, 1995). The lasso algorithm (Friedman et al., 2010) is also extremely fast. The RF is a little bit slow with large sample sizes due to the resampling procedure. The LGBM is very fast regardless of sample sizes in our experience. The SVM becomes extremely slow with large n (e.g., n = 100,000) because we must compute the Kernel function for each pair of observations, which requires O(n²) operations.

In practice, we need to select a set of tuning parameters among candidates. The candidates of the tuning parameters for the proposed method are presented in Section 4.3. For machine learning methods, the candidates are detailed in Section S3 in the Supplementary Material. For all methods, we must select an appropriate set of tuning parameters. These tuning parameters are generally selected with cross-validation. In this study, we choose these tuning parameters to minimize the MAPE in the past year. The values of tuning parameters are changed every month.

Table 3 shows the monthly MAPE for our proposed method and existing methods from April 2019 to March 2020. The result shows that the proposed method performs better than existing machine learning techniques in total. The SVM achieves the best performance in specific months but sometimes provides much larger MSE than other methods, resulting in worse performance than our proposed method in total. The RF and LGBM yield similar performance, and these methods produce larger MAPE than our proposed method. The lasso cannot capture the nonlinear relationship between temperature and demand as in Figure 1, resulting in poor performance in total. The LSTM provides the worst performance, probably due to the small number of observations: it is from n = 132 to n = 302. In general, the ANN does not perform well with such small sample sizes; (Tange et al., 2017) showed that the ANN performed worse than SVM for small sample sizes, and several similar numerical results are found in the literature (Bećirović and Ćosović, 2016; Dehalwar et al., 2016; Maldonado et al., 2019). We observe that the NNLS and LSE result in similar values of MAPE. These similar results are obtained probably due to the identifiability of the parameter. Our proposed procedure decomposes y_t as y_t ≈ μ_t + b_t, and this decomposition is not unique. Therefore, the NNLS and LSE produce different values of ${\hat{μ}}_{t}$ and ${\hat{b}}_{t}$ but similar values of ${\hat{μ}}_{t} + {\hat{b}}_{t}$ .

TABLE 3

TABLE 3. Monthly MAPE for our proposed methods (NNLS and LSE) and existing machine learning techniques (SVM, RF, Lasso, LGBM, and LSTM) on the dataset from Tokyo Electric Power Company Holdings from April 2019 to March 2020. The smallest MAPE is written in bold.

We also compute the MAPE for well-known Global Energy Forecasting Competition 2014 (GEFCom 2014) data (Hong and Fan, 2016), and obtain a similar result as that shown in Table 3. For detail, please refer to Section S4 in the Supplementary Material.

4.5 Interpretation

With our proposed method, the estimated model can be interpreted by decomposing the forecast value by the effects of temperature and past demand: ${\hat{y}}_{i j} = {\hat{b}}_{i j} + {\hat{μ}}_{i j}$ . The values, ${\hat{b}}_{i j}$ and ${\hat{μ}}_{i j}$ , for NNLS and LSE from October 1st to 14th, 2019, are depicted in Figure 4.

FIGURE 4

FIGURE 4. The values, $\hat{b}$ _ij and $\hat{μ}$ _ij , for NNLS (upper panel) and LSE (lower panel) from October 1st to 14th, 2019.

Although the forecast accuracy of the LSE estimation is similar to that of the NNLS estimation, as shown in Table 3, the results of the decomposition ${\hat{y}}_{i j} = {\hat{b}}_{i j} + {\hat{μ}}_{i j}$ for these two methods are unalike in terms of values. The LSE often results in negative values of ${\hat{b}}_{i j}$ , and then ${\hat{μ}}_{i j}$ becomes substantially larger than the actual demand. In particular, the value of ${\hat{b}}_{i j}$ at working hours is negatively larger than that at midnight; on the other hand, the electricity is used primarily during working hours. As a result, the behavior of ${\hat{b}}_{i j}$ is counterintuitive; ${\hat{b}}_{i j}$ becomes negatively larger as ${\hat{y}}_{i j}$ increases. Thus, interpreting the effect of weather turns out to be difficult with LSE. This issue occurs because there are no restrictions on the sign of ${\hat{b}}_{i j}$ . We observe a similar behavior of ${\hat{b}}_{i j}$ on other datasets, including GEFCom 2014.

In contrast, the effect of weather is appropriately captured by the NNLS estimation. Indeed, in many cases, the value of ${\hat{b}}_{i j}$ at working hours is positively larger than that at midnight. The constraint on nonnegativeness of γ significantly improves the interpretation of the weather effect while still maintaining excellent forecast accuracy.

To further investigate the effect of temperature, we depict ${\hat{b}}_{i j}$ on Tuesday when maximum temperatures are 5°C (cold day), 20°C (cool day), and 35°C (hot day), which is shown in Figure 5. For existing methods, it is difficult to depict ${\hat{b}}_{i j}$ because these methods do not assume the existence of ${\hat{b}}_{i j}$ . For existing methods, instead of ${\hat{b}}_{i j}$ , the weather effect is calculated depending on the input: we forecast the demand with specific temperature (5°C, 20°C, and 35°C) and average annual temperature (21.7°C in this case), and subtract the forecast value with average annual temperature from that with a specific temperature. We depict the weather effect on October 8th, 2019, and October 15th, 2019, for existing methods; both of these days are Tuesday.

FIGURE 5

FIGURE 5. Estimate of the weather effect bij on Tuesday for the proposed method and existing methods. For existing methods, the weather effect depends on the input. We depict the weather effect on October 8th, 2019, and October 15th, 2019, for existing methods; both of these days of week are Tuesday

The results of our proposed procedure show that ${\hat{b}}_{i j}$ is highly dependent on the temperature: the weather effect may be substantial for cold and hot days due to air conditioner use. For NNLS, the weather effect is always positive, which allows clear interpretation compared with LSE. Both NNLS and LSE result in smooth curves due to the smooth basis function of the regression coefficients. We observe that the weather effect is stable unless M is large and λ is excessively small.

The existing machine learning methods, RF, SVM, LGBM, and LSTM, are unstable and highly depend on the input. With the Lasso, the impact of temperature is almost zero, which implies that the weather effect cannot be captured. Our proposed method results in smoother curves than existing methods thanks to the B-spline basis expansion of regression coefficients in Eq. 4. As a result, our proposed method is more suitable for interpreting the weather effect than existing methods.

5 Application to Data From One Selected Research Facility in Japan

In the second real data example, we apply our proposed method to the demand data from one selected research facility in Japan. The raw data cannot be published due to confidentiality. This dataset consists of demand from January 1st, 2017 to May 30th, 2020. At certain times, demand is either not observed or include outliers due to electricity meter failures or blackouts. The daily data that contain such missing values and outliers are removed, resulting in 1,164 days of complete data. The demand is shown in kW at 1-h intervals (J = 24).

We observe that COVID-19 greatly influences the usage of this research facility’s electricity pattern due to the recommendation of telework. In present times, it is essential to forecast the demand for this extraordinary situation. To this end, our proposed method is directly applicable to the input of daily data related to COVID-19, and we focus our attention on the forecast accuracy in March, April, and May 2020.

5.1 Input of COVID-19 Information

In Japan, the most frequently-used daily information about COVID-19 is the daily number of infections, available at https://www3.nhk.or.jp/news/special/coronavirus/data-all/. However, this variable turns out to be relatively unstable. In order to use more stable information about COVID-19, we may use the moving average of the number of infections; that is, the average number of infections in the past several days.

The research facility used in this study promotes the staff to work from home to prevent the spread of COVID-19. We expect that the electricity usage of the research facility decreases as the number of infected people of the COVID-19 increases; therefore, the COVID-19 information may affect the forecast results. Figure 6 shows the relationship between the number of daily infections and demand, and between the average number of infections in the past 14 days and demand. We depict these plots on working days. The curve fitting is done by polynomial regression with cubic function. The B-spline may not be suitable in this case due to the limited number of observations.

FIGURE 6

FIGURE 6. Relationship between the number of daily infections and demand (upper panels), and between the average number of infections in the past 14 days and demand (lower panels). We depict these plots on working days. The curve fitting is done by polynomial regression with cubic function.

The number of daily infections includes outliers; thus resulting in unstable curve fitting. For example, when the number of infections is relatively large, the demand increases as the number of infections increases. This phenomenon is counterintuitive because this facility encourages working at home as much as possible to prevent infections. On the other hand, the average number of infections in the past 14 days is negatively correlated with the demand, and the curve fitting of the cubic function works well. For this reason, we use the average number of infections in the past 14 days as daily information about COVID-19. As a result, the daily variable s_i becomes a two-dimensional vector that consists of the maximum temperature and the average number of infections in the past 14 days. We use the cubic B-spline function as a basis function of maximum temperature, and the cubic function as a basis function of the average number of infections in the past 14 days.

5.2 Forecast Accuracy

In the previous real data example, we construct the statistical models by day of the week separately. However, in this case, the number of observations affected by COVID-19 is excessively small. Thus, only two statistical models are constructed based on working days (from Monday to Friday) and holidays (Saturday, Sunday, and national holidays). We consider the problem of the day-ahead forecast from March 1st, 2020 to May 30th, 2020 (data in 2017–2019 are only used for training). The training data consist of all demand data up to the previous day of the forecast day.

We investigate how the input of COVID-19 information improves the forecast accuracy. The NNLS does not produce negative regression coefficients but we observe that the electricity usage decreases as the average number of infections in the past 14 days increases, as shown in Figure 6. Thus, the electricity fluctuation caused by COVID-19, say $b_{i j}^{covid 19}$ , should be negative. As the basis function, we use the cubic function of negative number of infections in the past 14 days; that is, $b_{i j}^{covid 19} = β_{1} (j) {(- s_{i}^{covid 19})}^{3} + β_{2} (j) {(- s_{i}^{covid 19})}^{2} + β_{3} (j) (- s_{i}^{covid 19})$ , where $s_{i}^{covid 19}$ is the number of infections in the past 14 days.

Figure 7 shows the forecast values with our proposed method using NNLS and the actual demand. We depict two forecast values: one uses the information about COVID-19 and the other does not. The result shows that the usage of the COVID-19 data slightly improves the accuracy; for example, on the 10th and 13th.

FIGURE 7

FIGURE 7. Actual demand and forecast values from April 1st, 2020 to April 20th, 2020.

Figure 8 shows ${\hat{b}}_{i j}$ of both temperature and COVID-19. The results show that after April 10th, the negative effect of COVID-19 is observed. The government declared a state of emergency on April 7th, and following this, the usage pattern of the electricity changed. The change in electricity pattern on the 8th and 9th may not be captured due to excessively small sample sizes related to the number of infections. However, after the 9th, the effect of COVID-19 is captured. (insert Figure 8 here).

FIGURE 8

FIGURE 8. $\hat{b}$ ij for temperature (dotted line) and COVID-19 (dashed line).

Table 4 shows MAPE for our proposed method and existing methods on March, April, and May 2020. We also compare the performance of two estimation procedures: one uses the information about COVID-19 and the other does not. In March, the result shows that the proposed method performs the best and Lasso yields the worst performance. This is because the effect of temperature can be appropriately captured as shown in the previous example. The effect of COVID-19 is not crucial in March because the performance becomes worse when the COVID-19 information is included.

TABLE 4

TABLE 4. Monthly MAPE for our proposed methods (NNLS and LSE) and existing machine learning techniques (SVM, RF, Lasso, LGBM, and LSTM) on the dataset from one selected research facility in March, April, and May 2020. The smallest MAPE is written in bold.

In April and May, the information about COVID-19 significantly improves the performance for almost all methods. In particular, our proposed method and Lasso both produce small MAPE values, and interestingly, Lasso performs slightly better than our proposed method. There are two reasons to explain the superiority of Lasso. First, the relationship between the number of infections and the demand is approximated by a linear function, as shown in Figure 6. Second, the temperature effect is not crucial because the temperature does not change much in April and May. The nonlinear machine learning techniques, that is, SVM, RF, LGBM, and LSTM yield large MAPE in April, probably due to the small number of observations related to COVID-19; these techniques result in overfitting. In particular, LSTM produces much worse MAPE than other methods. Such poor results may be due to the instability of the electricity demand; the neural networks can be highly sensitive to noise [e.g., (Goodfellow et al., 2014)]. Indeed, we observe that the electricity demand of this dataset turns out to be unstable compared with the datasets from Tokyo electric power company holdings and Global Energy Forecasting Competition 2014. To improve the accuracy of the LSTM, we need to change the input/output by incorporating much more information of the data (e.g., not daily but hourly weather and hourly COVID-19 data).

6 Concluding Remarks

We have constructed a statistical model for forecasting future electricity demand. To capture the nonlinear effect of weather information, we employed the varying coefficient model. With the ordinary least squares estimation (LSE), the estimate of b_ij, say ${\hat{b}}_{i j}$ , became negative because some of the elements of regression coefficients $\hat{γ}$ were negative. The negative weather effect led to the difficulty in the interpretation of the weather effect. To address this issue, we employed the NNLS estimation; this estimation is performed under the constraint that all of the elements of $\hat{γ}$ are nonnegative. The practicality is illustrated through three real data examples. Two of these examples showed that our method performed better than the existing machine learning techniques. In the third example, our proposed method adequately captured the impact of COVID-19. Estimating the fluctuations of electric power caused by weather and COVID-19 would help make strategies for energy-saving interventions and demand response.

The proposed method is carried out under the assumption that the errors are uncorrelated. In practice, however, the errors among near time intervals may be correlated. As a future research topic, it would be interesting to assume a correlation among time intervals and estimate a regression model that includes the correlation parameter. Another important point is to incorporate the weather effects other than temperature and humidity; we may use cloud cover (Apadula et al., 2012), rainfall, and wind speed (Chapagain and Kittipiyakul, 2018). However, incorporating too many variables can often result in low forecast accuracy due to overfitting (Xie et al., 2016). The selection of an appropriate combination of variables is essential but beyond the scope of this study; we would like to take this as a future research topic.

Recently, demand response has been becoming one of the most important topics in the research field of energy. In particular, the machine learning techniques have been recently used for residential demand response (Zhou et al., 2016; Pallonetto et al., 2019; Afzalan and Jazizadeh, 2020). These techniques are based on the deterministic approaches. On the other hand, our proposed statistical model will lead to the demand response by the probabilistic approach (Schachter and Mancarella, 2015; Alipour et al., 2017). The probabilistic approach will allow the decision based on the probabilistic risk evaluation. Such demand response would be essential in practice, and we would like to work on this topic in the future.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://www.tepco.co.jp/en/forecast/html/download-e.html https://www.jma.go.jp/jma/indexe.html https://www3.nhk.or.jp/news/special/coronavirus/data-all/ http://blog.drhongtao.com/2017/03/gefcom2014-load-forecasting-data.html.

Author Contributions

The author confirms being the sole contributor of this work and has approved it for publication.

Funding

This work was partially supported by the Japan Society for the Promotion of Science KAKENHI Grant Number JP19K11862 and the Center of Innovation Program (COI) from JST Grant Number JPMJCE1318, Japan.

Conflict of Interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

The author would like to thank reviewers for the constructive and helpful comments that improved the quality of the paper considerably. The author also thanks Professor Hiroki Masuda and Maiya Hori for helpful comments and discussions.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fenrg.2021.724780/full#supplementary-material

References

Afzalan, M., and Jazizadeh, F. (2020). A Machine Learning Framework to Infer Time-Of-Use of Flexible Loads: Resident Behavior Learning for Demand Response. IEEE Access 8, 111718–111730. doi:10.1109/access.2020.3002155

CrossRef Full Text | Google Scholar

Alipour, M., Zare, K., and Abapour, M. (2017). Minlp Probabilistic Scheduling Model for Demand Response Programs Integrated Energy Hubs. IEEE Trans. Ind. Inform. 14, 79–88.

Google Scholar

Amral, N., Ozveren, C. S., and King, D. (2008). “Short Term Load Forecasting Using Multiple Linear Regression,” in 2007 42nd International Universities Power Engineering Conference (IEEE), 1192–1198.

Google Scholar

Apadula, F., Bassini, A., Elli, A., and Scapin, S. (2012). Relationships between Meteorological Variables and Monthly Electricity Demand. Appl. Energ. 98, 346–356. doi:10.1016/j.apenergy.2012.03.053

CrossRef Full Text | Google Scholar

Bećirović, E., and Ćosović, M. (2016). “Machine Learning Techniques for Short-Term Load Forecasting,” in 2016 4th International Symposium on Environmental Friendly Energies and Applications (EFEA) (IEEE), 1–4. doi:10.1109/efea.2016.7748789

CrossRef Full Text | Google Scholar

Bedi, J., and Toshniwal, D. (2019). Deep Learning Framework to Forecast Electricity Demand. Appl. Energ. 238, 1312–1326. doi:10.1016/j.apenergy.2019.01.113

CrossRef Full Text | Google Scholar

Bro, R., and De Jong, S. (1997). A Fast Non-negativity-constrained Least Squares Algorithm. J. Chemometrics 11, 393–401. doi:10.1002/(sici)1099-128x(199709/10)11:5<393:aid-cem483>3.0.co;2-l

CrossRef Full Text | Google Scholar

Cabrera, B. L., and Schulz, F. (2017). Forecasting Generalized Quantiles of Electricity Demand: A Functional Data Approach. J. Am. Stat. Assoc. 112, 127–136. doi:10.1080/01621459.2016.1219259

CrossRef Full Text | Google Scholar

Chapagain, K., and Kittipiyakul, S. (2018). Performance Analysis of Short-Term Electricity Demand with Atmospheric Variables. Energies 11, 818. doi:10.3390/en11040818

CrossRef Full Text | Google Scholar

Chen, Y., Xu, P., Chu, Y., Li, W., Wu, Y., Ni, L., et al. (2017). Short-term Electrical Load Forecasting Using the Support Vector Regression (SVR) Model to Calculate the Demand Response Baseline for Office Buildings. Appl. Energ. 195, 659–670. doi:10.1016/j.apenergy.2017.03.034

CrossRef Full Text | Google Scholar

De Boor, C. (1972). On Calculating with B-Splines. J. Approximation Theor. 6, 50–62. doi:10.1016/0021-9045(72)90080-9

CrossRef Full Text | Google Scholar

de Oliveira, E. M., and Cyrino Oliveira, F. L. (2018). Forecasting Mid-long Term Electric Energy Consumption through Bagging ARIMA and Exponential Smoothing Methods. Energy 144, 776–788. doi:10.1016/j.energy.2017.12.049

CrossRef Full Text | Google Scholar

Dehalwar, V., Kalam, A., Kolhe, M. L., and Zayegh, A. (2016). “Electricity Load Forecasting for Urban Area Using Weather Forecast Information,” in 2016 IEEE International Conference on Power and Renewable Energy (ICPRE) (IEEE), 355–359. doi:10.1109/icpre.2016.7871231

CrossRef Full Text | Google Scholar

Dudek, G. (2016). Pattern-based Local Linear Regression Models for Short-Term Load Forecasting. Electric Power Syst. Res. 130, 139–147. doi:10.1016/j.epsr.2015.09.001

CrossRef Full Text | Google Scholar

Engle, R. F., Granger, C. W. J., Rice, J., and Weiss, A. (1986). Semiparametric Estimates of the Relation between Weather and Electricity Sales. J. Am. Stat. Assoc. 81, 310–320. doi:10.1080/01621459.1986.10478274

CrossRef Full Text | Google Scholar

Fan, J. Q., and Zhang, W. Y. (1999). Statistical Estimation in Varying Coefficient Models. Ann. Stat. 27, 1491–1518. doi:10.1214/aos/1017939139

CrossRef Full Text | Google Scholar

Franc, V., Hlaváč, V., and Navara, M. (2005). Sequential Coordinate-wise Algorithm for the Non-negative Least Squares Problem. Comp. Anal. Images Patterns, Proc. 3691, 407–414. doi:10.1007/11556121_50

CrossRef Full Text | Google Scholar

Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. J. Stat. Softw. 33, 1–22. doi:10.18637/jss.v033.i01

PubMed Abstract | CrossRef Full Text | Google Scholar

Goodfellow, I. J., Shlens, J., and Szegedy, C. (2014). Explaining and Harnessing Adversarial Examples. arXiv preprint arXiv:1412.6572.

Google Scholar

Guo, Z., Zhou, K., Zhang, C., Lu, X., Chen, W., and Yang, S. (2018). Residential Electricity Consumption Behavior: Influencing Factors, Related Theories and Intervention Strategies. Renew. Sustain. Energ. Rev. 81, 399–412. doi:10.1016/j.rser.2017.07.046

CrossRef Full Text | Google Scholar

Guo, Z., Zhou, K., Zhang, X., and Yang, S. (2018). A Deep Learning Model for Short-Term Power Load and Probability Density Forecasting. Energy 160, 1186–1200. doi:10.1016/j.energy.2018.07.090

CrossRef Full Text | Google Scholar

Haq, M. R., and Ni, Z. (2019). A New Hybrid Model for Short-Term Electricity Load Forecasting. IEEE Access 7, 125413–125423. doi:10.1109/access.2019.2937222

CrossRef Full Text | Google Scholar

Harvey, A., and Koopman, S. J. (1993). Forecasting Hourly Electricity Demand Using Time-Varying Splines. J. Am. Stat. Assoc. 88, 1228–1236. doi:10.1080/01621459.1993.10476402

CrossRef Full Text | Google Scholar

Hastie, T., Tibshirani, R., and Friedman, J. (2008). The Elements of Statistical Learning. 2nd edn. New York: Springer.

Google Scholar

Hastie, T., and Tibshirani, R. (1993). Varying-Coefficient Models. J. R. Stat. Soc. Ser. B (Methodological) 55, 757–779. doi:10.1111/j.2517-6161.1993.tb01939.x

CrossRef Full Text | Google Scholar

He, W. (2017). Load Forecasting via Deep Neural Networks. Proced. Comp. Sci. 122, 308–314. doi:10.1016/j.procs.2017.11.374

CrossRef Full Text | Google Scholar

Hochreiter, S., and Schmidhuber, J. (1997). Long Short-Term Memory. Neural Comput. 9, 1735–1780. doi:10.1162/neco.1997.9.8.1735

PubMed Abstract | CrossRef Full Text | Google Scholar

Hoerl, A. E., and Kennard, R. W. (1970). Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 12, 55–67. doi:10.1080/00401706.1970.10488634

CrossRef Full Text | Google Scholar

Hong, T., and Fan, S. (2016). Probabilistic Electric Load Forecasting: A Tutorial Review. Int. J. Forecast. 32, 914–938. doi:10.1016/j.ijforecast.2015.11.011

CrossRef Full Text | Google Scholar

Hong, T., Gui, M., Baran, M. E., and Willis, H. L. (2010). Modeling and Forecasting Hourly Electric Load by Multiple Linear Regression with Interactions. IEEE PES Gen. Meet., 1–8. doi:10.1109/pes.2010.5589959

CrossRef Full Text | Google Scholar

Jiang, H., Zhang, Y., Muljadi, E., Zhang, J. J., and Gao, D. W. (2018). A Short-Term and High-Resolution Distribution System Load Forecasting Approach Using Support Vector Regression with Hybrid Parameters Optimization. IEEE Trans. Smart Grid 9, 3341–3350. doi:10.1109/tsg.2016.2628061

CrossRef Full Text | Google Scholar

Ju, Y., Sun, G., Chen, Q., Zhang, M., Zhu, H., and Rehman, M. U. (2019). A Model Combining Convolutional Neural Network and Lightgbm Algorithm for Ultra-short-term Wind Power Forecasting. IEEE Access 7, 28309–28318. doi:10.1109/access.2019.2901920

CrossRef Full Text | Google Scholar

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., et al. (2017). “Lightgbm: A Highly Efficient Gradient Boosting Decision Tree,” in Advances in Neural Information Processing Systems, 3146–3154.

Google Scholar

Kong, W., Dong, Z. Y., Jia, Y., Hill, D. J., Xu, Y., and Zhang, Y. (2018). Short-Term Residential Load Forecasting Based on LSTM Recurrent Neural Network. IEEE Trans. Smart Grid 10, 841–851.

Google Scholar

Lawson, C. L., and Hanson, R. J. (1995). Solving Least Squares Problems. Philadelphia, PA: Society for Industrial and Applied Mathematics. doi:10.1137/1.9781611971217 Solving Least Squares Problems.

CrossRef Full Text | Google Scholar

Lee, J. D., Sun, D. L., Sun, Y., and Taylor, J. E. (2016). Exact post-selection Inference, with Application to the Lasso. Ann. Stat. 44, 907–927. doi:10.1214/15-aos1371

CrossRef Full Text | Google Scholar

Lee, J. D., and Taylor, J. E. (2014). Exact Post Model Selection Inference for Marginal Screening. arXiv.1402, 5596.

Google Scholar

Liu, B., Nowotarski, J., Hong, T., and Weron, R. (2017). Probabilistic Load Forecasting via Quantile Regression Averaging on Sister Forecasts. IEEE Trans. Smart Grid 8, 730–737.

Google Scholar

Lundberg, S. M., and Lee, S.-I. (2017). “A Unified Approach to Interpreting Model Predictions,” in Advances in Neural Information Processing Systems, 4765–4774.

Google Scholar

Lusis, P., Khalilpour, K. R., Andrew, L., and Liebman, A. (2017). Short-term Residential Load Forecasting: Impact of Calendar Effects and Forecast Granularity. Appl. Energ. 205, 654–669. doi:10.1016/j.apenergy.2017.07.114

CrossRef Full Text | Google Scholar

Maldonado, S., González, A., and Crone, S. (2019). Automatic Time Series Analysis for Electric Load Forecasting via Support Vector Regression. Appl. Soft Comput. 83, 105616. doi:10.1016/j.asoc.2019.105616

CrossRef Full Text | Google Scholar

Miswan, N. H., Said, R. M., and Anuar, S. H. H. (2016). ARIMA with Regression Model in Modelling Electricity Load Demand. J. Telecommunication, Electron. Comp. Eng. 8, 113–116.

Google Scholar

Pallonetto, F., De Rosa, M., Milano, F., and Finn, D. P. (2019). Demand Response Algorithms for Smart-Grid Ready Residential Buildings Using Machine Learning Models. Appl. Energ. 239, 1265–1282. doi:10.1016/j.apenergy.2019.02.020

CrossRef Full Text | Google Scholar

Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). ““Why Should I Trust You?” Explaining the Predictions of Any Classifier,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 1135–1144.

Google Scholar

Ruiz-Abellón, M. C., Fernández-Jiménez, L. A., Guillamón, A., Falces, A., García-Garre, A., and Gabaldón, A. (2020). Integration of Demand Response and Short-Term Forecasting for the Management of Prosumers’ Demand and Generation. Energies 13, 11.

Google Scholar

Saber, A. Y., and Alam, A. K. M. R. (2017). “Short Term Load Forecasting Using Multiple Linear Regression for Big Data,” in 2017 IEEE Symposium Series on Computational Intelligence (SSCI), 1–6. doi:10.1109/SSCI.2017.8285261

CrossRef Full Text | Google Scholar

Schachter, J. A., and Mancarella, P. (2015). Demand Response Contracts as Real Options: a Probabilistic Evaluation Framework under Short-Term and Long-Term Uncertainties. IEEE Trans. Smart Grid 7, 868–878. doi:10.1109/tsg.2015.2405673

CrossRef Full Text | Google Scholar

Tange, R. I., Rasmussen, M. A., Taira, E., and Bro, R. (2017). Benchmarking Support Vector Regression against Partial Least Squares Regression and Artificial Neural Network: Effect of Sample Size on Model Performance. J. Near Infrared Spectrosc. 25, 381–390. doi:10.1177/0967033517734945

CrossRef Full Text | Google Scholar

Timotheou, S. (2016). Fast Non-negative Least-Squares Learning in the Random Neural Network. Prob. Eng. Inf. Sci. 30, 379–402. doi:10.1017/s0269964816000061

CrossRef Full Text | Google Scholar

Van Houdt, G., Mosquera, C., and Nápoles, G. (2020). A Review on the Long Short-Term Memory Model. Artif. Intell. Rev. 53, 5929–5955. doi:10.1007/s10462-020-09838-1

CrossRef Full Text | Google Scholar

Vilar, J., Aneiros, G., and Raña, P. (2018). Prediction Intervals for Electricity Demand and price Using Functional Data. Int. J. Electr. Power Energ. Syst. 96, 457–472. doi:10.1016/j.ijepes.2017.10.010

CrossRef Full Text | Google Scholar

Wang, F., Liu, L., Yu, Y., Li, G., Li, J., Shafie-Khah, M., et al. (2018). Impact Analysis of Customized Feedback Interventions on Residential Electricity Load Consumption Behavior for Demand Response. Energies 11, 770. doi:10.3390/en11040770

CrossRef Full Text | Google Scholar

Wang, Y., Gan, D., Sun, M., Zhang, N., Lu, Z., and Kang, C. (2019). Probabilistic Individual Load Forecasting Using Pinball Loss Guided Lstm. Appl. Energ. 235, 10–20. doi:10.1016/j.apenergy.2018.10.078

CrossRef Full Text | Google Scholar

Xie, J., Chen, Y., Hong, T., and Laing, T. D. (2016). Relative Humidity for Load Forecasting Models. IEEE Trans. Smart Grid 9, 191–198.

Google Scholar

Yang, A., Li, W., and Yang, X. (2019). Short-term Electricity Load Forecasting Based on Feature Selection and Least Squares Support Vector Machines. Knowledge-Based Syst. 163, 159–173. doi:10.1016/j.knosys.2018.08.027

CrossRef Full Text | Google Scholar

Zhang, N., Li, Z., Zou, X., and Quiring, S. M. (2019). Comparison of Three Short-Term Load Forecast Models in Southern california. Energy 189, 116358. doi:10.1016/j.energy.2019.116358

CrossRef Full Text | Google Scholar

Zhou, D., Balandat, M., and Tomlin, C. (2016). “Residential Demand Response Targeting Using Machine Learning with Observational Data,” in 2016 IEEE 55th conference on decision and control (CDC) (IEEE), 6663–6668. doi:10.1109/cdc.2016.7799295

CrossRef Full Text | Google Scholar

Keywords: basis expansion, nonnegative least squares (NNLS), short-term demand forecasting, varying coefficient model (VCM), COVID-19

Citation: Hirose K (2021) Interpretable Modeling for Short- and Medium-Term Electricity Demand Forecasting. Front. Energy Res. 9:724780. doi: 10.3389/fenrg.2021.724780

Received: 14 June 2021; Accepted: 01 November 2021;
Published: 14 December 2021.

Edited by:

Fei Teng, Imperial College London, United Kingdom

Reviewed by:

Ekanki Sharma, University of Klagenfurt, Austria
Md. Abdur Razzak, Independent University, Bangladesh, Bangladesh
Chaoyu Dong, Imperial College London, United Kingdom

Copyright © 2021 Hirose. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Kei Hirose, aGlyb3NlQGltaS5reXVzaHUtdS5hYy5qcA==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Interpretable Modeling for Short- and Medium-Term Electricity Demand Forecasting

1 Introduction

2 Proposed Model

2.1 Expression of bij

2.2 Expression of μij

2.3 Proposed Model

3 Estimation

3.1 Nonnegative Least Squares

3.2 Forecast

4 Application to Demand Data From Tokyo Electric Power Company Holdings

4.1 Basis Functions

4.2 Candidates of Tuning Parameters

4.3 Impact of Tuning Parameters on Forecast Accuracy

4.4 Forecast Accuracy

4.5 Interpretation

5 Application to Data From One Selected Research Facility in Japan

5.1 Input of COVID-19 Information

5.2 Forecast Accuracy

6 Concluding Remarks

Data Availability Statement

Author Contributions

Funding

Conflict of Interest

Publisher’s Note

Acknowledgments

Supplementary Material

References

2.1 Expression of b_ij

2.2 Expression of μ_ij