A Novel Decomposition and Combination Technique for Forecasting Monthly Electricity Consumption

With the share of electricity in total final energy consumption increasing quickly, the world is becoming increasingly dependent on electricity, which makes it more and more important to improve the forecasting accuracy of electricity consumption to ensure the normal operation of economic activities. In this paper, a novel decomposition and combination technique to forecast monthly electricity consumption is proposed. First, we use STL decomposition to obtain the trend, season, and residual components of the time series. Second, we use SARIMA, SVR, ANN, and LSTM to forecast trend, season, and residual component, respectively. Third, we use time correlation principle to improve the forecasting accuracy of season component. Fourth, we integrated the residual component predicted by SARIMA, SVR, ANN, and LSTM into a new sequence to improve the forecasting accuracy of residual component. In order to verify the performance of the proposed forecast model, monthly electricity consumption data in China is introduced as an example for empirical analysis. The results show that after STL decomposition, time correlation modification, and residual modification, the forecasting accuracy of each model has been gradually improved. We believe that the proposed forecast model in this paper can also be used to solve other mid- and long-term forecasting problems with obvious seasonal characteristics.


INTRODUCTION Background
Resource depletion and global climate change are serious problems that human society is facing and will face for a long time. To escape from this dilemma, the global energy mix needs two transformations: clean energy substitution on the energy supply side and electric energy substitution on the energy consumption side. This paper focuses on electricity consumption. According to statistics, global electrification of the final consumption continues to follow an increasing trend, and the share of electricity in total final energy consumption is close to 20% in 2020.
As the world becomes more and more dependent on electricity, planning for electricity production is crucial. In addition, electricity is difficult to store, so it is usually used immediately after it is generated. This further increases the need for power companies to plan their electricity supply in a proactive manner. Reliable forecast of future electricity consumption level is the primary guiding principle of planning. In particular, high forecasting accuracy of medium-and long-term electricity consumption is the key to power system scheduling and planning. In contrast, inaccurate forecast of electricity consumption can backfire. Overestimation will waste scarce energy resources, huge capital investment, and long construction time. Underestimation will lead to more serious negative consequences, such as power shortage. Clearly, if effective early warning is given in advance based on high forecasting accuracy of electricity consumption, some measures can be adopted to avoid negative consequences. However, electricity consumption is uncertain, complex, and nonlinear, which depends on political conditions, economy (Lin and Liu, 2016), human activities, population behavior (Hussain et al., 2016), climate factors (Hernández, 2013), and other external factors affecting the forecasting accuracy of electricity consumption.

Literature Review and Motivation
At present, many techniques are used to forecast electricity consumption, which can be roughly divided into three categories: nonlinear intelligent model, statistical analysis model, and gray forecasting model. Nonlinear models mainly include the artificial neural network (Kandananond, 2011;Kaytez et al., 2015;Liu et al., 2017;Ghadimi et al., 2018;Bedi and Toshniwal, 2019;Hamzaçebi et al., 2019), support vector machine (Pai and Hong, 2005;Kavaklioglu, 2011;Cao and Wu, 2016), and Markov chain (Zhao et al., 2014). In addition to the nonlinear intelligent models mentioned above, statistical analysis models, such as regression analysis method (Mohamed and Bodger, 2005;Wang et al., 2018) and autoregressive integrated moving average (Yuan et al., 2016), have also been widely used in electricity consumption forecasting. The gray forecasting model proposed by Deng enjoys high popularity in many forecasting applications because it can describe the characteristics of uncertain systems even in the face of a small amount of data. Therefore, some literature forecast electricity consumption based on the gray model (Akay and Atak, 2007;Bahrami et al., 2014;Zhao and Guo, 2016;Xu et al., 2017;Ding et al., 2018;Wu et al., 2018).
These methods can generally provide good forecasts. However, the statistical analysis models have the limitation of linear (or near linear) assumption, the gray forecasting models are usually only suitable for time series that approximate exponential growth, and the nonlinear intelligent models often suffer from overfitting or the difficulty of parameter selection. To remedy these shortcomings, some decomposition and combination techniques have been proposed in recent years and achieve better performance: the SARIMA model with residual modification (Wang et al., 2012), wavelet transform combined with machine learning and time series models (Nguyen and Nabney, 2010), weighted hybrid model where trend and seasonal components are predicted by combined method, and SARIMA, respectively (Zhu, 2011), bagging ARIMA and exponential smoothing methods (de Oliveira and Cyrino Oliveira, 2018), convolutional neural networks and fuzzy time series (Sadaei et al., 2019), and structural combination of seasonal exponential smoothing forecasts (Rendon-Sanchez and de Menezes, 2019).
For the above existing researches, there are still some issues that need to be further studied. First, the statistical analysis models assume linearity and have good forecasting accuracy for periodic and regular sequences. The nonlinear intelligent model can forecast nonlinear and irregular time series better, but it has the problem of overfitting. How could the advantages of the two methods be combined to improve the forecasting accuracy? Second, except for the fluctuations of monthly electricity consumption affected by extreme weather changes, and sudden major economic and health events, the monthly electricity consumption also shows strong periodicity and regularity, so the comprehensive utilization of these two characteristics is meaningful to increase forecasting accuracy.

Contributions
To bridge the gap discussed above in the Literature review and motivation section, this paper develops a novel decomposition and combination forecasting technique. The primary research contents of this paper include three parts. First is the research on the monthly electricity consumption forecast based on STL decomposition. Second is the research on a time correlation modification based on annual periodicity and adjacent similarity to improve the forecasting accuracy of the season component. Third, considering the residual component has nonlinear and irregular characteristics, the individual model may only extract a certain feature of the sequence. Therefore, we integrate the residual component predicted by four models into a new sequence to improve the forecasting accuracy of the residual component. The main contributions of this paper are as follows: 1) A novel decomposition and combination forecasting model utilizing STL decomposition, time correlation principle (embodied as annual periodicity and adjacent similarity), and hybrid forecasting principle is proposed. 2) The monthly electricity consumption data of China are applied to evaluate the performance of the proposed model.
The remainder of the paper is organized as follows. The Electricity consumption month-ahead forecasting model section introduces the proposed forecasting model. The Case study section presents the simulation results and discussion, in which the performance of the proposed forecasting model is evaluated. Finally, conclusions are drawn in the Conclusion section.

ELECTRICITY CONSUMPTION MONTH-AHEAD FORECASTING MODEL
This section first briefly introduces individual models, including the STL algorithm, SARIMA, SVR, ANN, and LSTM model. Then the operation process of the proposed decomposition and combination method is described.

Seasonal-Trend Decomposition Using Loess Decomposition
For seasonal time series, academics generally use STL decomposition proposed by Cleveland et al. (1990) to obtain trend, season, and residual components. STL is a decompose model in the form of addition. In STL, loess is used to divide the time series into trend component, seasonal component, and residual component. Division is addition, that is, adding up the parts to get the original series. Specifically, the steps of STL decomposition are 1) detrending; 2) periodic subsequence smoothing: establish a sequence for each seasonal component and smooth it separately; 3) smoothing periodic substring low-pass filtering: recombine substring to smooth; 4) detrending the seasonal series; 5) detrending the original series using the seasonal components calculated in the previous steps; and 6) smoothing the de-seasonal sequence to obtain the trend component.

Seasonal Autoregressive Integrated Moving Average
SARIMA is one of the most widely used linear models for time series prediction. The general equation of this model is given by Eq. 1.
Here y t is time series, a t is white noise, and B is the lag operator. D represents the seasonal differentiation order, and d represents the regular differentiation order. (2) Eqs. 2, 3 represent the autoregressive and moving average polynomial, respectively. They represent the dependence of future values of time series on past values as well as errors. (4) Similarly, Eqs. 4, 5 represent the seasonal autoregressive and seasonal moving average polynomials, respectively. Addition of these polynomials to the ARIMA equation helps in capturing the seasonal variation in time series. Differentiation is necessary for converting the nonstationary time series to a stationary one. S represents the order of seasonality.

Support Vector Machine
SVM was first proposed by Vapnik (1963) based on the statistical learning theory and principle of structural risk minimization, which possess good performance even for small samples. The basic idea of support vector regression is to map original data to high-dimensional feature space and perform linear regression in the space. It can be formulated into: where φ(x) is a nonlinear mapping function, f(x) is the estimation value, and w T and b are weights. It can be translated into an optimization problem: where C is the penalty parameter, and ξ t and ξ p t are the nonnegative slack variables. Generally speaking, the parameters of SVR have a great influence on the accuracy of the regression estimation. Thereby, the grid search method is employed to automatically choose the optimal parameters of SVR in this paper.

Artificial Neural Network
ANN is an information processing method based on the biological neural network. Neural networks can theoretically simulate any complex nonlinear relationship through nonlinear units (neurons) and have been widely used in the field of forecast. The structure of artificial neural network consists of input layer, hidden layer, and output layer. The most widely used ANN model is the BP neural network model based on the BP algorithm. The neural network is determined by determining the weight between each layer. Therefore, the neural network is trained to set all the weights before being used for prediction. The initial weights are set randomly, and the output data can be obtained according to certain rules when the training process is going forward. The weights are modified based on the difference between the output data and the expected data during the fallback process. The forward and backward process is repeated until the difference between the output data and the required data is small enough.

Long Short-Term Memory
Traditional artificial neural networks (ANN) attempt to establish direct mapping between input historical data and output forecast data to achieve prediction methods. However, due to the absence of time correlation in data series, the neural network model cannot capture the relationship between data and time, which limits its application in time series prediction methods. Therefore, recursive neural network (RNN) is proposed to overcome this shortcoming. By adding cyclic connections on neurons, RNN can establish sequence-to-sequence mappings between input and output data. Therefore, the output of each time step is affected by the input of the previous time step. Therefore, RNN is used to realize the memory feature (Sutskever et al., 2014;LeCun et al., 2015).
The structure of RNN is shown in Figure 1. Each node represents a single time-step neuron. The connection weight Frontiers in Energy Research | www.frontiersin.org December 2021 | Volume 9 | Article 792358 of input neuron is W1, the self-connection weight of each neuron is W2, and the connection weight of output neuron is W3. The input data sequence enters the network in turn according to the time step, and the weight coefficient is recycled.

The Proposed Forecast Framework
The proposed forecast framework utilizing STL decomposition, time correlation modification (embodied as annual periodicity and adjacent similarity) and residual modification is illustrated in Figure 2.
In Figure 2, The proposed forecast framework consists of four steps: In the first step, we use the Seasonal-Trend decomposition using Loess (STL decomposition) to obtain the trend, season, and residual components of the time series.
In the second step, we use SARIMA, SVR, ANN, and LSTM to forecast trend, season, and residual component, respectively.
In the third step, we use the time correlation principle to improve the forecasting accuracy of the season component. The season component presents time correlation characteristics, which embodies as annual periodicity and adjacent similarity. Here the annual periodicity means that data from the same month in the next year are  similar. The adjacent similarity means that data are close to each other in adjacent months. In the second step, only adjacent similarity is used. We divide the season component into 11 2 subsequences, each of which represents a certain month. Then the exponential smoothing method is used to forecast each subsequence. The forecasting results are weighted with the season component predicted by each model (SARIMA, SVR, ANN, LSTM) to improve the forecasting accuracy of the season component. The weight is calculated based on the last forecasting error of the model.
In the fourth step, because the residual component has nonlinear and irregular characteristics, the individual model may only extract a certain feature of the sequence, so the forecasting accuracy is low. In fact, it is rare that a single forecasting model is always best in all cases. Each model has its own unique strengths and weaknesses. When multiple forecasting models are available, consider a combined approach, which is a good way to take full advantage of the strengths of each model. Therefore, we integrate the residual component predicted by SARIMA, SVR, ANN, and LSTM into a new sequence, and replace the residual component predicted by the above four methods with the new sequence to improve the forecasting accuracy of the residual component.

Data Collection
We evaluate the performance of the proposed forecasting method using the monthly electricity consumption data of China. However, these figures cannot be used directly as Chinese New Year always lasts for a few days in January or February. Almost all companies and factories have stopped operating. As a result, electricity consumption in January and February is sometimes abnormal. To avoid this problem, we treat the January and February averages as observations of a new month 1 and 2 each year, i.e., each year has 11 monthly values with a period length of 11. This study collects electricity consumption data from the beginning of 1 and 2 2006 to the end of August 2021 to keep relevant to the current situation of electricity development. These original data are shown in Figure 3.

Experimental Design
We select the data from 1 and 2 2006 to December 2018 as the training dataset (i.e., the first 143 data points) and the remaining data as the test dataset (i.e., the last 29 data points). The training data set is further divided into the optimization training data set and the verification data where y is the actual value,ŷ is the forecasted value, and i is the index value of the data.     Figure 4 shows the STL decomposition results of the monthly electricity consumption. The trend component of the electricity consumption of China is increasing year by year, and the growth trend has accelerated since 2016. This is mainly because in 2016, eight departments in China jointly issued The Guidelines on Promoting the Substitution of Electric Energy, with a view to increasing the proportion of electric energy in the final energy consumption to 27%. Electric energy substitution is an important way to achieve carbon peak and carbon neutrality by replacing coal, oil, gas, and wood with electricity in energy consumption. The season component vibrates more and more. Due to financial crisis, extreme weather events, and epidemic, there are several relatively large negative and positive shocks on the residual component. If the original sequence is directly used, these huge shocks will seriously threaten the forecasting accuracy of the model. Table 1 shows the performance evaluation results of four models without STL decomposition and with STL decomposition. The model comparisons demonstrate that STL decomposition is effective in boosting the forecasting accuracy of monthly electricity consumption. Compared with any single    Table 1, the divide-and-conquer strategy improves the forecasting accuracy. Next, we analyze the source of errors, that is, the percentage of trend, season, and residual component forecasting errors to the total errors. As shown in Figure 5, for any model, most of the errors come from residual component forecast. SARIMA, in particular, was the least effective. This is because the residual component has nonlinear and irregular characteristics, and SARIMA is not good at forecasting these kinds of sequences. In addition, a single model may only extract a certain feature of the sequence, so the forecasting accuracy is low. For trend component, it can be seen that SARIMA has the highest forecasting accuracy, while for machine learning algorithms, such as SVR, ANN, and LSTM, the accuracy is not high. Therefore, it can be concluded that the traditional statistical method is better for simple sequence like trend component. The forecasting errors of season component also account for a large part.

Time Correlation Modification
As shown in Figure 5, the errors caused by season component account for 18%-28%. In this section, we use the periodicity of the seasonal series to improve the forecasting accuracy of the season component. As we can see, the season component presents time correlation characteristics, which embodies as annual periodicity and adjacent similarity. We divide the season component into 11 subsequences, each of which represents a certain month. Then exponential smoothing method is used to forecast each subsequence. The forecasting results are weighted with the season component predicted by each model (SARIMA, SVR, ANN, and LSTM) to improve the forecasting accuracy of the season component. The weight is calculated based on the last forecast error of the model. Rows 3-6 in Table 2 show that the forecasting accuracy has been improved after time correlation modification (TCM). Figure 5 shows that most of the errors come from a residual component. This is because the residual component has nonlinear and irregular characteristics; a single model may only extract a certain feature of the sequence, so the forecasting accuracy of a single model is low. Therefore, we need to improve the forecasting accuracy of the residual component. We integrate the residual component predicted by SARIMA, SVR, ANN, and LSTM into a new sequence, and replace the residual component predicted by the above four methods with the new sequence to optimize each model. Rows 7-10 in Table 2 show that the forecasting accuracy has been improved after residual modification (RM).

Comparison Between Different Models
Figures 6-8 show that after STL decomposition, time correlation modification, and residual modification, the forecasting accuracy of each model has been gradually improved. Among them, the forecasting accuracy improved the most after STL decomposition. This is mainly because there are many random disturbances in the original sequence, and the model will be affected by these disturbances if it is not decomposed.

DISCUSSION
According to Figures 6-8, STL-SARIMA-TCM-RM is the most accurate forecasting model. 3 . As we can see, on the one hand, compared with machine learning, SARIMA is better at forecasting trend, season, and other sequences with clear patterns. That is why it is so accurate. On the other hand, SARIMA is not good at forecasting an irregular random term. Therefore, residual modification can improve the forecasting accuracy of SARIMA most significantly. Figures 9-11 show the STL-SARIMA-TCM-RM forecasting performance in the data set, as well as a scatter plot of forecasting results and actual values.
Considering that the test set includes COVID-19, we divide the test set into 2019, 2020, and 2021. Table 3 shows that the 2019 forecast results are significantly better than that for 2020 and 2021.

CONCLUSION
This paper provides a novel decomposition and combination method to forecast electricity consumption. This approach first uses STL to decompose the sequence into trend, season, and residual components. Then the three decomposed subsequences are forecasted, and the season component forecasting results are modified according to the annual periodicity, and the forecasting results of the residual component of each model are integrated. The results show that STL-SARIMA-TCM-RM is the most accurate forecasting model.
In addition to electricity forecasting, we believe that the forecasting method proposed in this paper can also be used to solve other mid-and long-term forecasting problems with obvious seasonal characteristics, including tourist flow forecasting, energy consumption forecasting, traffic flow forecasting, and so on. Furthermore, this paper only focuses on univariate time series analysis and does not consider other factors affecting electricity consumption. If these factors can be introduced into the proposed learning method, the predictive performance may be better.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.