Adaptive individual residential load forecasting based on deep learning and dynamic mirror descent

With a growing penetration of renewable energy generation in the modern power networks, it has become highly challenging for network operators to balance electricity supply and demand. Residential load forecasting nowadays plays an increasingly important role in this aspect and facilitates various interactions between power networks and electricity users. While numerous research works have been proposed targeting at aggregate residential load forecasting, only a few efforts have been made towards individual residential load forecasting. The issue of volatility of individual residential load has never been addressed in forecasting. Thus, to fill this gap, this paper presents a deep learning method empowered with dynamic mirror descent for adaptive individual residential load forecasting. The proposed method is evaluated on a real-life Irish residential load dataset, and the experimental results show that it improves the prediction accuracy by 9.1% and 11.6% in the aspects of RMSE and MAE respectively in comparison with a benchmark method.


Introduction
As advanced metering infrastructure (AMI) is being widely deployed in the modern power system, especially, smart meters, a growing number of granular data of residential electricity consumption has become easily available on a large scale (Sajjad et al., 2016;Xie et al., 2018). This huge amount of data enables power network operators to motivate residential customers to actively participate in demand side management (DSM) through a wide range of various demand response programs (DRPs), for example, time-of-use pricing (Zhou et al., 2016;Ponocko and Milanovic, 2018). As part of DSM, residential load forecasting is a significantly important but challenging task for power network operators, due to great irregularity and uncertainty of residential load (Welikala et al., 2019). As a result, addressing the challenges of residential load forecasting plays a crucial role in interactions between network operators and residential customers, efficient and costeffective grid operations, and household energy consumption optimizations.
At present, residential load forecasting is generally categorized into two classes -aggregate and individual. More specifically, data analytics for aggregate residential load forecasting mainly include support vector regression (SVR) (Humeau et al., 2013;Wijaya et al., 2015), random forest (Goehry et al., 2020), artificial neural networks (ANNs) (Marinescu et al., 2013;Marinescu et al., 2014;Quilumba et al., 2015;Campos and Silva, 2016;Stephen et al., 2017;Wang et al., 2018;Oprea and Bara, 2019), and deep neural networks (DNNs) (Zheng et al., 2018;Zou et al., 2019). Besides, these methods tend to be combined with clustering techniques, for instance, k-means clustering, in order to improve the forecasting performance. In general, a number of models based on these methods have obtained a desirable level of prediction accuracy on aggregate residential load forecasting. This is because a variety of behaviours of residential customers can smoothen out their overall load profile at the aggregate level, therefore generating an easily identifiable energy consumption pattern.
However, compared to aggregate residential load forecasting, only a few researchers have attempted to explore individual residential load forecasting so far. Some traditional machine learning methods, for instance, ANNs, are still applied to forecast individual residential load (Paterakis et al., 2016;Xu et al., 2016;Vossen et al., 2018;Dinesh et al., 2019;Wu et al., 2020). During recent years, DNNs, such as recurrent neural networks (RNNs) and long short-term memory (LSTM) networks (Gan et al., 2017;Kong et al., 2017;Hossen et al., 2018;Shi et al., 2018;Kong et al., 2019;Alhussein et al., 2020;Wang et al., 2020;Lin et al., 2021), have been largely adopted due to their superior capability in extracting complex patterns. Although existing DNN models have mostly achieved a comparatively higher prediction accuracy than many traditional machine learning models, they are essentially trained on a limited amount of residential load data offline and then applied to perform forecasting online. As a result, these well-trained offline models are likely to encounter many sudden changes of residential load when forecasting online, which are not included in training. This is because individual residential load can be extremely volatile and uncertain, which would have a significantly negative effect on the forecasting performance of the DNN models.
To address the issue of volatility of individual residential load in forecasting, this paper presents a method for short-term individual residential load forecasting, which is able to adjust the forecasting error dynamically. Specifically, the key contributions of the paper are summarized as follows: 1) Firstly, it presents an LSTM based deep learning method empowered with dynamic mirror descent (DMD) for adaptive individual residential load forecasting. 2) Secondly, it modifies the original DMD to make it feasible for adjusting residential load forecasting.
3) Thirdly, it devises a comprehensive feature expression strategy to describe load characteristics at each time step in order to form the input of the forecasting model. 4) Finally, the proposed method is validated and compared with a published benchmark method on a real-life Irish residential load dataset, and the influence of the modified DMD on the forecasting performance of the proposed method is investigated in detail.
The remainder of this paper is organized as follows. Section 2 reports a comprehensive literature review on residential load forecasting. Section 3 briefly introduces recurrent neural networks and LSTM networks, and then details the modifications made on the original DMD. Section 4 integrates DMD into an LSTM based deep learning method for adaptive individual residential load forecasting. In Section 5, the proposed residential load forecasting method is evaluated on a real-life Irish residential load dataset. Finally, Section 6 concludes the paper and points out some future work.
2 Review on residential load forecasting A number of research works have been presented in the area of aggregate residential load forecasting. Wijaya et al. (2015) designed a short-term cluster-based aggregate residential load forecasting strategy. It firstly clusters residential customers, then forecasts the energy consumption of each cluster separately through SVR, and finally aggregates the energy consumption forecasts of all clusters. Similar to Wijaya et al. (2015), Humeau et al. (2013) developed a residential load forecasting method for the district level, which combines k-means clustering with SVR. Different from Wijaya et al. (2015) and Humeau et al. (2013), Goehry et al. (2020) employed hierarchical clustering and random clustering respectively to divide residential customers into subsets and applied random forests to build the forecasting model for each subset.
ANNs have also been commonly applied to forecast residential load at the aggregate level. For example, a dynamic forecasting mechanism was proposed to monitor small-scale residential electricity demand and detect anomalous pattern changes in Marinescu et al. (2014). A self-organizing map is employed for anomalous day detection, and an ANN prediction model changes its input neurons according to a previously detected and recorded match in a database of anomalous days in order to conduct demand prediction for anomalous days. Wang et al. (2018) proposed an ensemble method for short-term aggregate residential load forecasting, which produces the forecasts for all load subprofiles based on hierarchical clustering and ANNs, and then combines all the forecasts with different weights to obtain the final forecasting result of the aggregate residential load. Quilumba et al. (2015) presented a three-step aggregate residential load forecasting approach, based on k-means clustering and ANNs. In Marinescu et al. (2013) and Campos and Silva (2016), a comprehensive performance comparison was made between ANNs and some other prediction models, such as auto-regression and auto-regressive integrated moving average. Moreover, a gated recurrent unit (GRU) neural network based approach was developed to perform short-term load forecasting for residential community (Zheng et al., 2018). Also, it uses least absolute shrinkage and selection operator (LASSO) and partial correlation analysis to explore the influences of temperature, humidity, rainfall, and wind speed on residential load in order to determine the input variables for the forecasting model.
In summary, aggregate residential load is comparatively easier to forecast than individual residential load, because individual behaviors have the features of volatility and uncertainty in energy consumption. As a result, existing forecasting models on aggregate residential load have obtained a satisfactory level of prediction accuracy.
By contrast, only a few efforts have been made towards individual residential load forecasting. Xu et al. (2016) proposed a k-nearest vector auto-regressive framework with exogenous input to spatial-temporally model household electricity demand. Dinesh et al. (2019) presented a forecasting approach to the power consumption of a single household, which is based on non-intrusive load monitoring (NILM) and graph spectral clustering. Different from Xu et al. (2016) and Dinesh et al. (2019), Vossen et al. (2018) developed a probabilistic forecasting model to describe the uncertainty of individual residential load using two different types of densityestimation ANNs respectively. In Wu et al. (2020), a boosting-based framework for multiple kernel learning regression was presented to forecast individual residential load. It not only adopts boosting to learn an ensemble of multiple kernel regressors, but also applies transfer learning to forecast the load of the residential customer which has a very limited amount of energy consumption data. In addition, Gan et al. (2017) employed a quantile LSTM network to perform probabilistic residential load forecasting at the individual level. In Alhussein et al. (2020), a hybrid model combining a convolutional neural network and an LSTM network was proposed to forecast the individual household load. Wang et al. (2020) designed a framework for short-term individual residential load forecasting. It firstly partitions historical load data by clustering to train multiple LSTM models, and then uses a fully connected cascade neural network to fuse the multiple LSTM models. Shi et al. (2018) proposed a novel poolingbased deep RNN to avoid overfitting during residential load forecasting. It batches the load profiles of a group of residential customers into an input pool in order to increase data diversity and volume. Also, Lin et al. (2021) presented a graph neural network based method for individual residential load forecasting, which aims to capture both temporal information of historical load and spatial information of neighbouring households in order to improve the forecasting accuracy. In Hossen et al. (2018), different types of DNNs, such as RNNs and LSTM networks, were applied to short-term individual residential load forecasting and a performance comparison was conducted among them. Furthermore, automatic hyperparameter tuning was utilised to select an optimal hyperparameter combination for an LSTM network in order to improve the accuracy of individual residential load forecasting (Kong et al., 2017).
Although a variety of forecasting models for individual residential load have been developed, their training data is unable to include all the cases on residential energy consumption, as individual residential load tends to change dramatically over time, which leads to a poor prediction accuracy when they are applied online.

Long short term memory model
As a sequence based model, RNNs are capable of establishing excellent temporal correlation between previous and current information (Chen et al., 2016;Tolosana et al., 2018). This characteristic makes RNNs an ideal candidate for short-term residential load forecasting, because the residential load consumption pattern has a strong and complex relationship between adjacent time steps (Kong et al., 2019). However, in terms of the specific implementation, a special RNN, called the LSTM network, is employed in this paper, as it significantly improves the performance of the general RNN. In this section, the RNN architecture is firstly introduced, and then the LSTM unit is explained.

Recurrent neural networks
In the working process, the RNN aims to map the input sequence of x values into corresponding sequential outputs: y. Specifically, the learning process conducts every single time step from t 1 to t τ. For time step t, the network neuron parameters at the l th layer update their shared states with the following equations (Shi et al., 2018): Frontiers in Energy Research frontiersin.org 03 target is the true output at time step t, h (t) l is the shared states of the l th layer at time step t, N is the total layer number of the network, and a (t) l is the input of the l th layer at time step t, which consists of three components: 1) the input x (t) at time step t or the shared states h (t) l−1 of the l − 1 th layer at time step t; 2) the bias b l of the l th layer; 3) the shared states h (t−1) l of the l th layer at time step t-1.
Due to their shared states, RNNs are able to learn dependency contained in the previous time steps.

LSTM units
RNNs are trained by backpropagation through time, but learning long-term dependency with RNNs is difficult because of gradient vanishing or exploding (Kong et al., 2019). In order to overcome these two issues, an LSTM unit is introduced, and LSTM has gradually become the most popular structure of RNNs in solving many time series problems.
Let {x 1 , x 2 , /, x T } denote a typical input sequence for an LSTM unit, where x t ∈ R k represents a k-dimensional vector of real values at time step t. In order to establish temporal relations, the LSTM unit defines and maintains an internal memory cell state throughout the life cycle, which is the most important element of the LSTM unit. The memory cell state s t−1 interacts with the intermediate output h t−1 and the subsequent input x t to determine which elements of the internal state vector should be updated, maintained, or erased according to the outputs of the previous time step and the inputs of the present time step. Apart from the internal state, the LSTM unit also defines the input node g t , the input gate i t , the forget gate f t , and the output gate o t . The formulations of all nodes in the LSTM unit are presented below from (6) to (11) (Kong et al., 2019): where W fx , W fh , W ix , W ih , W gx , W gh , W ox , and W oh denote the weight matrices of the corresponding inputs of the network activation functions, ⊙ denotes the element-wise multiplication, σ denotes the sigmoid activation function, and Φ denotes the tanh activation function.
In each time step, the memory cell state has three operations: 1) discard useless information from the memory cell state s t ; 2) add the new information i t extracted from the input x t and the intermediate output h t−1 into the memory cell state s t ; 3) determine the new intermediate output h t from the memory cell state s t . Thus, the memory cell state is capable to keep useful information for a long time and result in RNN performance enhancement.

Dynamic mirror descent
As an online learning method, dynamic mirror descent (DMD) is capable to incorporate a dynamic model in the learning process, and effectively minimize the loss and estimate time-varying system states (Hall and Willett, 2015;Ledva et al., 2015). DMD is executed by two main steps: 1) an observation-based update incorporates the new measurement into the parameter prediction; 2) a model-based update advances the parameter prediction to the next time step. The frequently used notations in DMD are given in Table 1, and the detailed steps of DMD are presented in Algorithm 1.

Algorithm 1: Dynamic Mirror Descent
In order to apply DMD to adjust the forecasted value of residential load dynamically, following the work presented in Ledva et al. (2018), a few modifications are made to the original DMD. The idea is that the concept of the original DMD is still adopted but it is not a direct implementation of the original DMD. In other words, the modified DMD considers the forecasting model as a black box and simply adjusts its output with the measured and forecasted values. Hence, the modified DMD is formulated as follows: where ∇l t is an arbitrary sub-gradient function of l t (·);k t is the adjustment variable accumulating the deviation between the forecasted and measured values; η is the constant step size; Φ is the residential load forecasting model; θ t is the input data of Φ. The model-based update (13) only computes an intermediate prediction θ t+1 without the real measurement influencing θ t+1 . The measurement-based update and the model-based prediction are combined in (14) to obtain a final predictionθ t+1 .

Frontiers in Energy Research
frontiersin.org In this paper, for simplicity, the loss function l t (θ t ) is selected Thus, the convex function (12) can be simplified as the following: where y t andθ t are the real measurement and the final forecast of residential load respectively. As a result, the modified DMD is formed using (13-15).
4 Adaptive individual residential load forecasting 4.1 Implementation process Due to high volatility and uncertainty of individual residential load, a comprehensive feature expression strategy is devised in order to describe the details of the energy consumption at each time step. So, the input features of a data sample S t at a particular time step t are detailed as follows: 1) the sequence of the residential load for the past T time steps E t ∈ R T is formed as: where e t is the energy consumption (kWh) at time step t; 2) the sequence of the half-hourly indexes for the past T time steps D t ∈ R T is formed as: where d t ∈ [1, 48] is the half-hourly index for time step t, because the sampling frequency is once every half an hour; 3) the sequence of the day indexes for the past T time steps W t ∈ R T is formed as: where w t ∈ [1, 7] is the day index for time step t, as there are 7 days in a week; 4) the sequence of the holiday signs for the past T time steps H t ∈ R T is formed as: where h t is the holiday sign for time step t, which is either one or 2, and one denotes non-holiday and two denotes holiday (in this paper, it is assumed that weekdays are non-holiday and weekends are holiday). Thus, a data sample S t is a matrix of a concatenation of the four sequences, expressed below: S t E T respectively. In order to speed up the convergence of the forecasting model and improve its generalization capacity, the input features are normalized to [0, one] according to their nature. To be specific, the min-max normalization method is adopted for E t , while D t , W t , and H t are encoded by a one-hot encoder. The one-hot encoder maps an original element of the feature sequence with M categories into a new sequence with M elements, where only the new element corresponding to the original element is one while the rest are all zeros. Hence, a normalized data sample S t is expressed as: respectively. Each row of the normalized data sample S t is the detailed features for the corresponding time step. In order to perform adaptive residential load forecasting, an LSTM network is firstly well trained for each resident, and then it is applied with the modified DMD adjusting the forecasting error dynamically. In general, the proposed method goes through the following four steps sequentially to forecast residential load: 1) the input sample is formed; 2) the input sample is normalized and fed to the well trained LSTM network to obtain the intermediate forecast; 3) the adjustment variable of the modified DMD is updated; 4) the final forecast is computed by summing the adjustment variable and the intermediate forecast. The framework of integration of deep learning and dynamic mirror descent for adaptive individual residential load forecasting is shown in Figure 1, and the steps of LSTM and DMD integration are detailed in Algorithm 2.

Dataset description
The dataset used in this paper is from the Smart Metering Electricity Customer Behaviour Trials initiated by Commission  for Energy Regulation in Ireland (Commission for Energy Regulation, 2012). The trials lasted from July 2009 to December 2010 with over 5000 Irish residential customers and small and medium enterprises (SMEs) participating. In the trials, there are 929 1-E-E customers, which means that they are all residential customers (1) with the controlled stimulus (E) and the controlled tariff (E). These customers are billed at the flat rate without any stimulus, and therefore are most representative because the majority of residential customers outside the trials are of this type. Among these 929 customers, 782 customers have a complete record of energy consumption throughout the trials. In this paper, 750 1-E-E customers with a complete record are randomly selected as the experiment dataset.

Experiment setup
The full data of a single residential customer is divided into a training dataset and a test dataset with a ratio of 9:1. So, for each resident, 90% of the data samples are used for training, while the rest of 10% are used for testing. In addition, as this paper is not focused on improving the prediction accuracy via the optimal network structure, hyperparameter fine-tuning is not conducted on the LSTM network. All the experiment parameters are presented in Table 2.

Results and discussion
In this section, a performance comparison was firstly made between the proposed residential load forecasting method and a published benchmark method presented in Kong et al. (2019). It is noted that the benchmark method only uses the same LSTM network as the proposed method but does not apply any online learning method. RMSE and MAE are employed as the performance indexes for residential load forecasting, formulated as follows: where y t is the forecasted value, y t is the real value, and N is the size of the test dataset. Furthermore, the effect of the parameter η of the modified DMD on the proposed residential load forecasting method was investigated. In this paper, the adjustment variable is initialised as 0.

Performance analysis of adaptive individual residential load forecasting
A performance comparison was conducted between the proposed and benchmark methods in terms of prediction accuracy. In this case, the parameter η of the modified DMD is set as 1.0 × 10 −5 , 1.0 × 10 −4 , 1.0 × 10 −3 , 1.0 × 10 −2 , 1.0 × 10 −1 , and 1.0×10 0 respectively, and the optimal forecasting result obtained is regarded as the result of the proposed method. The results of both methods are presented in Table 3.
It is noted that Table 3 describes the average RMSE and MAE of all the residents. In Table 3, the proposed method performs  Frontiers in Energy Research frontiersin.org much better than the benchmark method, in terms of both RMSE and MAE. Besides, the improvement percentage of MAE is higher than that of RMSE, because MAE and RMSE indicate the forecasting performance from two different perspectives. To be specific, MAE, which reflects the mean of errors, regards every error equally and averages all the errors, while RMSE, which reflects the fluctuation of errors, strengthens the large error and weakens the small error. The reason for the significant performance improvement of the proposed method can be explained as follows. As the adjustment variablek t of the modified DMD is capable to update itself based on the errors between the forecasts and the real measurements of the previous time steps, the proposed method can effectively adjust the intermediate forecast θ t of the current time step to obtain the final forecastθ t .
In addition, Figure 2 presents the RMSE and MAE reduction of the proposed method across all the residents compared to the benchmark method, while Figure 3 presents the statistics of improvement percentage of the proposed method compared to the benchmark method.
In Figure 2, it can be clearly seen that the proposed method achieves different levels of improvements on a large number of residents. More specifically, some residents receive significant RMSE and MAE reductions

FIGURE 6
Optimal values of parameter η for the proposed method to achieve the minimum RMSE and MAE values.
Frontiers in Energy Research frontiersin.org 09 (e.g., from 0.4 to 1.0), but others only receive slight RMSE and MAE reductions (e.g., from 0.01 to 0.05). It is also noted that some residents obtain an RMSE decrease but an MAE increase, while others obtain the opposite result. Besides, there is no performance difference between the proposed and benchmark methods for a few residents. This demonstrates that the modified DMD fails to effectively adjust the forecasting error over time, mainly because of the great complexity of these residential load profiles.
In Figure 3, there are totally 555 + 116 + 62 + 5 = 738 residents with an RMSE reduction, which account for 738/750 = 98.4% of all residents. Among them, most residents obtain an RMSE reduction of less than 20%, which account for (555 + 116)/750 = 89.47%. Besides, only five residents obtain an RMSE reduction of even more than 60%. Similarly, a total of 443 + 87 + 85 + 17 = 632 residents receive an MAE reduction, which account for 632/750 = 84.27%. Among them, most residents obtain an MAE reduction of less than 20%, which account for (443 + 87)/750 = 70.67%. Only 17 residents obtain an MAE reduction of even more than 60%. It is also noted that 118 residents fail to obtain an MAE reduction, while only 12 residents fail to obtain an RMSE reduction. This fact indicates that the proposed method tends to decrease RMSE in comparison with MAE.
Furthermore, Figure 4 shows the load profiles of a random resident forecasted by the proposed and benchmark methods

Frontiers in Energy Research
frontiersin.org during a random week (Monday 29/11/2010-Sunday 5/12/ 2010). It is obvious in Figure 4 that the forecasted load profile of the proposed method is much closer to the real load profile than that of the benchmark method. To be specific, when a dramatic increase or decrease of the residential load occurs, the proposed method can capture the change rapidly. Also, it can track the residential load stably, when the residential load only fluctuates slightly. By contrast, the benchmark method is unable to forecast accurately, when the residential load changes significantly over time.

Effect of parameter η on performance of adaptive individual residential load forecasting
As the parameter η plays an important role in the modified DMD, its effect on the forecasting performance of the proposed method was further investigated. Figure 5 depicts the MAE and RMSE reduction of the proposed method across all residents compared to the benchmark method, when the modified DMD is applied with different values of the parameter η.
It can be clearly seen in Figure 5A that the proposed method has the worst performance when η is 1.0×10 0 , because a large number of residents fail to receive an MAE reduction. However, when η is 1.0 × 10 −1 , 1.0 × 10 −2 , and 1.0 × 10 −3 , the proposed method performs much better, because a majority of residents receive different MAE reductions. Only a small amount of residents receive an MAE reduction, when η is 1.0 × 10 −4 and 1.0 × 10 −5 . It is also noted that there are a variety of trends of MAE reductions among all the residents as η changes from 1.0×10 0 to 1.0 × 10 −5 . Likewise, in Figure 5B, in terms of RMSE, the proposed method has the worst performance when η is 1.0×10 0 , but performs much better when η is 1.0 × 10 −1 , 1.0 × 10 −2 , and 1.0 × 10 −3 . Only a small number of residents obtain an RMSE reduction, when η is 1.0 × 10 −4 and 1.0 × 10 −5 . Besides, different trends of RMSE reductions can be seen among all the residents, as η changes from 1.0×10 0 to 1.0 × 10 −5 .
The reason for the poor forecasting performance of the proposed method, when η is too small or too large, can be explained as follows. In (15), if η is too small, the deviation between the real measurement and the final forecast at the current time step cannot be accumulated effectively in the adjustment variable at the next time step. Thus, the modified DMD is unable to adjust the intermediate forecast properly over time. But, if η is too large, the deviation between the real measurement and the final forecast at the current time step accounts for a large proportion in the adjustment variable at the next time step, significantly weakening the accumulation of the deviations of the previous time steps. Therefore, the modified DMD fails to track the forecasting error accurately over time.
Furthermore, Figure 6 shows the optimal values of the parameter η when the proposed method achieves the best performance in terms of RMSE and MAE respectively. It is obvious in Figure 6 that the proposed method rarely achieves the minimum MAE value when η is 1.0×10 0 and 1.0 × 10 −5 . However, it is capable to achieve the minimum MAE value on a large amount of residents when η is 1.0 × 10 −3 and 1.0 × 10 −4 . Similarly, the proposed method is able to achieve the minimum RMSE value only on a few residents when η is 1.0×10 0 and 1.0 × 10 −5 . But, it achieves the minimum RMSE value on a majority of residents when η is 1.0 × 10 −1 and 1.0 × 10 −2 . As MAE and RMSE measure errors from two perspectives, the optimal values of η are different, when the proposed method achieves the minimum MAE and RMSE values on a single resident.
In addition, Figure 7 describes the load profiles of a random resident forecasted by the proposed method on a random day (Wednesday 22/12/2010) when the modified DMD is applied with different values of the parameter η, while Figure 8 describes the forecasting errors of the proposed method on this resident as the parameter η changes.
In Figure 7, the proposed method performs forecasting accurately at most time steps throughout the day, when η is 1.0 × 10 −2 and 1.0 × 10 −3 . By contrast, when η is 1.0 × 10 −4 and 1.0 × 10 −5 , the forecasted load profiles of the proposed method are quite close to that of the benchmark method. When η is 1.0×10 0 and 1.0 × 10 −1 , there is a significant deviation between the real load profile and the forecasted load profiles of the proposed method at many time steps. This is mainly because a too small or too large value of η has a negative influence on the modified DMD. In Figure 8, MAE of the proposed method firstly decreases from η = 1.0 × 10 −5 to η = 1.0 × 10 −3 , and then increases from η = 1.0 × 10 −3 to η = 1.0×10 0 . The proposed method achieves the lowest MAE value of 0.3379 when η is 1.0 × 10 −3 . Similarly, RMSE of the proposed method firstly decreases from η = 1.0 × 10 −5 to η = 1.0 × 10 −2 , and then increases from η = 1.0 × 10 −2 to η = 1.0×10 0 . The proposed method achieves the lowest RMSE value of 0.5355 when η is 1.0 × 10 −2 .

Conclusion
This paper has presented an adaptive individual residential load forecasting method, which integrates deep learning and dynamic mirror descent to address the issue of great volatility of individual residential load. The original DMD is modified to become feasible for dynamic residential load forecasting. Besides, a detailed feature expression strategy is devised to provide the proposed method with sufficient information of energy consumption at each time step. The experimental results have shown that the proposed method has improved the prediction accuracy substantially by 9.1% in RMSE and 11.6% in MAE, in comparison with the published benchmark method. In addition, the effect of the parameter η of the modified DMD on the proposed method is Frontiers in Energy Research frontiersin.org further explored, and the comparison results have indicated that the optimal value of η can be found out to achieve the maximum performance improvement. Future work will focus on fine-tuning techniques to combine with deep learning and explore their effects on residential load forecasting. Optimization techniques will also be applied to search for the optimal value of the parameter η of the modified DMD in a continuous space in our future work.

Data availability statement
Publicly available datasets were analyzed in this study. This data can be found here: https://www.ucd.ie/issda/data/ commissionforenergyregulationcer.

Author contributions
FH: responsible for investigation, conceptualization, methodology, formal analysis, visualization, original draft preparation, and review and editing. XW: responsible for review and editing, revision, project administration, and funding acquisition.