DRA-net: A new deep learning framwork for non-intrusive load disaggregation

The non-intrusive load decomposition method helps users understand the current situation of electricity consumption and reduce energy consumption. Traditional methods based on deep learning are difficult to identify low usage appliances, and are prone to model degradation leading to insufficient classification capacity. To solve this problem, this paper proposes a dilated residual aggregation network to achieve non-intrusive load decomposition. First, the original power data is processed by difference to enhance the data expression ability. Secondly, the residual structure and dilated convolution are combined to realize the cross layer transmission of load characteristic information, and capture more long sequence content. Then, the feature enhancement module is proposed to recalibrate the local feature mapping, so as to enhance the learning ability of its own network for subtle features. Compared to traditional network models, the null-residual aggregated convolutional network model has the advantages of strong learning capability for fine load features and good generalisation performance, improving the accuracy of load decomposition. The experimental results on several datasets show that the network model has good generalization performance and improves the recognition accuracy of low usage appliances.


Introduction
With the development trend of smart grid, the traditional intrusive load monitoring method has many problems, such as high construction cost and difficult application, which makes the non-invasive load monitoring method a unique way to solve these problems. Non-intrusive load decomposition can help power companies more easily obtain the power consumption of users and understand the power consumption of various electrical appliances. Non-intrusive load decomposition can more accurately predict the distribution of residential power consumption and the total amount of residential load by providing the power consumption of each electrical appliance of users, reduce planning investment, save budget, and avoid unnecessary waste of power resources. It is also conducive to the scientific formulation of relevant policies for dynamic demand response by power companies, the adjustment of electricity prices, the evaluation of relevant projects and the more reasonable allocation of power resources, and the formation of a more benign and friendly interaction between users and power companies, so as to achieve the results of peak shaving and valley filling, mutual benefit and win-win results. On the other hand, if a family can know more details about electricity consumption, it will consciously reduce energy waste. For example, most American and British families install smart meters in their homes to facilitate users to learn about the low peak electricity price information in time, promote users to use electricity at night or at low prices, alleviate the pressure of peak electricity consumption, avoid power loss, and indirectly improve the economic benefits of power sources (Zhao et al., 2019). In 1992, Hart proposed non-invasive load monitoring (NILM). Its essence is non-invasive load decomposition NILD (Hart, 1992), that is, the total energy consumption is decomposed into a single device to analyze the electricity consumption behavior of residential users. This provides effective feedback on residential electricity consumption, helping users save energy and reduce electricity charges (Paterakis et al., 2017). Hart's method is mainly to extract steady-state features for power decomposition. Based on Hart's algorithm, a simple noninvasive charge load monitoring system can be designed. However, this algorithm can only be used for a small number of electrical appliances, and the number of types of features extracted is small. When there are many types and numbers of electrical appliances, the decomposition accuracy of this algorithm decreases significantly (Dash and Sahoo, 2022).
In view of the problems in the above non-invasive load decomposition, Inagaki et al. (2011) and others used the integer programming method to monitor the load of household power equipment, but it is only applicable to equipment in discrete operation mode. Kolter et al. (2010) studied sparse coding algorithm to improve decomposition performance, but this method is only applicable to data sets containing low resolution data types.  and Chang et al. (2013) used particle swarm optimization algorithm to carry out non-invasive charge load decomposition experiments for a small number of several electrical appliances. This algorithm can decompose the total power data to each electrical equipment at the same time, but the error of the decomposition results is still large. The optimization method is based on load characteristic analysis. First, the static and dynamic characteristics of the load should be modeled. The total load curve is the superposition of multiple loads. The objective of optimization is to obtain the optimal load coefficient (i.e., the contribution of each load), so as to minimize the residual between the superimposed total load and the actual load. Piga et al. (2016) proposed a sparse optimization algorithm for non-invasive charge load decomposition, which reduced the decomposition error to a certain extent. Ahmadi and Marti (2015) proposed a non-invasive charge load decomposition experiment based on feature matching (also called load information matching), which effectively solved the problem of high similarity between load features. Johnson and Willsky (2013); Luan et al. (2022); Xia et al. (2021) used the Hidden Markov Model to perform noninvasive charge load decomposition. Similar to the combination optimization algorithm, these algorithms first obtain the state power of electrical appliances through clustering. Its encoding and decoding process is the process of optimizing the power values obtained by these clustering, and the decomposition results are also the combination of power values obtained by clustering, which cannot obtain more accurate electrical power consumption values (Himeur et al., 2020a;Fan et al., 2021). Compared with hidden Markov algorithm, Tsai and Lin (2012) proposed a method achieves more accurate non-invasive load decomposition through K-nearest neighbor regression algorithm. However, when the power consumption difference between appliances is large, this algorithm cannot achieve accurate decomposition. Other researches, such as algorithms based on Adaboost algorithm (Hassan et al., 2014), 2D phase encoding algorithm (Himeur et al., 2020b;Himeur et al., 2021a), fuzzy algorithm (Lin Y. H and Tsai, 2014), bagging tree algorithm (Himeur et al., 2020c), histogramming descriptor algorithm (Himeur et al., 2021b) and neural network algorithm, have made certain achievements in non-invasive charge load decomposition tasks. For the machine learning model, nonintrusive load decomposition is to take the total power time series data as the input, take the power data of each electrical appliance as the output (fitting method) or take the electrical appliance category as the output (classification). These two tasks are consistent in nature, although it is more difficult to take the power data of each electrical appliance as the output. For machine learning methods, a large number of sample data are needed for training. Non-intrusive load decomposition based on machine learning is to fit the machine learning model through training of a large number of samples, so as to obtain the power distribution of various loads under different total power conditions, which has no essential difference from traditional machine learning fitting and classification. For 1/60 Hz sampling data, it can meet the needs of non-intrusive load decomposition very well. In Lin's work, the fuzzy C-means algorithm based on particle swarm optimization is combined with the fuzzy neural algorithm for non-invasive charge load decomposition experiments. This algorithm can identify the state of an electrical appliance at a certain time, and also solve the problem of high similarity between the power consumption characteristics of electrical appliances. Park et al. (2019) proposed an equipment status recognition algorithm based on neural network, which is simple and fast in decomposition. Welikala et al. (2019) proposed a NILD method, which combines the application usage patterns (AUPs) of equipment to improve the state recognition performance of high-frequency appliances. Himeur et al. (2021c) proposed a histogram post-processing of 2D local binary patterns for smart grid applications. Guo et al. (2021) proposed a multi-model combination model for non-intrusive load disaggregation, which It can integrate the advantages of various methods and improve the accuracy of decomposition. The traditional methods mainly use artificial features to realize the identification of electrical appliances by optimizing methods, but it is difficult to extract effective artificial features. The algorithm is highly sensitive to noise and has low decomposition accuracy.
Traditional load feature extraction needs manual design, so it is difficult to extract effective features, and it is difficult to analyze the features of time series. Recently, deep learning technology has been widely used in various fields (Qu et al., 2021;Chen et al., 2022;Gao et al., 2022;Song et al., 2023). The application of deep learning in non-invasive load decomposition has gradually attracted researchers' attention. So far, there are a large number of load record data for model training. For model training, we need to record the power data of a household user and each load. At present, there are many data sets, such as UK-DALE public data set and WikiEnergy data set, which provides a data basis for the application of deep learning. Different from traditional pattern recognition, deep learning can automatically extract features without manual extraction. In Kelly's experiment (Kelly and Knottenbelt, 2015), it is proved that AutoEncoder method has the best effect in sequence pair sequence method. Singh and Majumdar (2018), Singh and Majumdar (2019) proposed deep sparse coding for non-intrusive load monitoring, improved decomposition efficiency. Xia et al. (2019) constructed a deep dilated residual network for load feature extraction, it can improve feature utilization. Jia et al. (2021) used bidirectional dilated residual network to realize the sequence to point non-intrusive load decomposition. At present, there are two main load decomposition methods: sequence to sequence and sequence to point. Sequence-to-sequence refers to the direct decomposition of the input sequence into different load sequences. Instead of training a network to predict a window, the sequence to point method is only to predict the midpoint element of input window. The idea is that the input of the network is a mains window, the output is the power at one point of various electrical devices. Zhang et al. (2016) realized sequence to sequence and sequence to point non-invasive charge load decomposition using convolutional neural network. In Zhang's experiment, sequence to point decomposition method has achieved good results in the decomposition of most electrical equipment. Compared with other deep learning methods, convolutional neural network has been proved to be more effective in the application of non-invasive load decomposition. However, Zhang's experiment uses a relatively shallow convolutional neural network, which is prone to the phenomenon of gradient disappearance, and cannot extract the deep level charge load characteristics. It is difficult to capture the relationship between long time series data. Xia et al. (2020) constructed a deep LSTM model to realize the decomposition of sequences into multiple sequences, and improved the decomposition accuracy through depth feature extraction. However, the current deep learning model is prone to model degradation and other problems, resulting in insufficient fitting ability. In addition, because it is difficult to extract the features of low usage appliances, the weight of low usage appliances in deep learning training is too small. Therefore, the existing deep learning model has poor decomposition effect on low utilization rate appliances.
To solve above problem, a dilated-residual aggregation network (DRA-Net) is constructed and applied to non-invasive load disaggregation. The network model increases the receptive field of convolution kernel through hole convolution to capture more features. In addition, a feature enhancement module is proposed to improve the learning ability of the model to fine load features, and further improve the generalization performance of non-invasive load decomposition. In conclusion, our contributions are as follows: 1) The differential processing of raw power data enhances the ability of data expression. 2) A structure combining residual structure and dilated convolution is proposed to realize cross-layer transmission of load characteristic information and capture more long sequence content. 3) A feature enhancement module is proposed to recalibrate the local feature mapping to enhance the learning ability of network for fine features.

Dilated-residual aggregation convolutional neural network
Since different load devices in residential houses have different electrical characteristics and some other influencing factors such as interference noise, this work realizes the optimization and improvement of the common convolutional network structure, and proposes a new network model, Dilated-residual Aggregation Convolutional Neural Network (DRAnet) to realize Non-intrusive power load disaggregation, whose overall structure is shown in Figure 1.
As shown in Figure 1, the network model includes the ordinary convolutional layer, the Dilated Resblock, Feature Enhancement Module (FEM), Information Fusion Module (IFM), and Fully Connected Layer (FC). There are a total of three void residual modules, namely, Dilated Resblock1, Dilated Resblock2, and Dilated Resblock3. There are a total of three feature enhancement modules, namely, FEM1, FEM2, and FEM3. The total load power is differentially processed and then recombined with the original data as the input side, which enriches the edge information of residential power data. The convolutional layer of the Dilatedresidual Aggregation Convolutional Neural Network enhances the extraction of load features of different residential electrical devices by combining multiple convolutional kernels to retain the basic load characteristic information. The initial convolutional layer for load disaggregation feature mapping is followed by three void residual modules. The non-intrusive power load disaggregation

FIGURE 1
Structure of the dilated-residual aggregation convolutional neural network.

Frontiers in Energy
Research 03 frontiersin.org task enhances the model's ability to fit the load features without increasing the number of parameters through these three cavity residual modules. At the same time, it ensures that the signal passed layer by layer in the network will not be lost when back propagation is performed. The dilated residual module is based on the idea of "cross-layer connection" and uses the residual connection to further extract features from the low-order load features Hu et al. (2022). The higher-order residential load feature mapping contains more abstract load characteristics and timing information.
To enhance the disaggregation performance of the network for load, a feature enhancement module is proposed to process the output mapping of different stages in the network to obtain the attention weight matrix. This matrix is multiplied and summed with the corresponding vectors with the output of the dilated residual module. This module facilitates the integration and strengthening of the base load characteristics extracted by the first three void residual modules, and fully utilizes the load characteristics and timing information of each network stage, which makes the fitting ability of the whole network structure enhanced, especially for the load devices with low usage frequency like washing machines and dishwashers, and has a great disaggregation improvement effect. The output dilated residual convolutional feature mapping and the feature-enhanced load feature mapping both contain a large amount of different higher-order load information. Using the dimensional splicing in the information fusion module and conventional convolutional operation processing, these two parts are then integrated, and finally the power prediction results of the electrical equipment are output by the full-connected operation. Dilated-residual Aggregation Convolutional Neural Network constructed in this section is mainly designed to make up for the defects of insufficient utilization of residential load characteristics, poor disaggregation of low-use appliances and disappearance of gradients, reflecting the more excellent feature extraction capability and learning ability of the dilated residual aggregation convolutional network.

Dilated resblock
The dilated convolution uses the parameter Dilated Factor (DF) to adjust the size of the dilated convolution (Miao et al., 2022). Since loads like washing machines and dishwashers are used less frequently and have sparse temporal features, the proposed dilated convolution allows resampling of the underlying load feature mapping. Pooling and down-sampling operations cause the loss of temporal information of the load, while the advantage of the dilated convolution is that it can both replace the pooling effect and increase the field of sensation exponentially (field of sensation refers to the corresponding size of the convolution kernel, that is, the range of the convolution for the load series.), allowing each convolution output to capture a larger range of feature information, which has a good feature extraction effect for load appliances like washing machines without adding extra redundant number of parameters.
The dilated convolution kernel in the Dilated-residual Aggregation Convolutional Neural Network is shown in Figure 2B. Figure 2A shows the ordinary convolutional kernel convolution, where x l and x l+1 are the input and output of l + 1 layer respectively. Assuming that the convolutional kernel size kernel is a and the step stride is 1, when the hole rate d is 1, i.e., the number of filled "0" weights is 0. From the calculation of Eq. 1, we can see that the mapping range length of layer l + 1 (L l+1 ) and layer l (L l ) is the same.
where padding is the number of padding zeros. The dilated rate d is 2 in Figure 2B, and the perceptual field is expanded to 5 × 1. So, the advantage of the dilated convolution lies in the ability to increase the local receptive field during the convolution operation and capture more information about the load characteristics without introducing additional parameters. For load feature extraction, dilated convolution can control the receptive field without changing the size of the feature map, so as to extract multi-scale information and effectively improve the accuracy of load decomposition. The combination of the three-layer dilated convolution and the residual connection constitutes a dilated residual module, as shown in Figure 3. The feature mapping is performed sequentially using the dilated convolution kernel with convolution of the dilated rate of 1, 2, and 3, including Leaky-Relu (Leaky ReLU is the commonly used activation function of convolution neural network at present), Batch Normalization (Similar to common data standardization, it is a way to unify scattered data and a common method to optimize neural network at present.), and other operations for processing, and the reason why Relu is not used as a non-linear activation function is that when the input value of the convolution layer is negative, the learning speed of Relu will be slow (Wang et al., 2022), even deactivating the neurons and preventing them from updating weights, resulting in the disappearance of the network gradient. Leaky-Relu activation function can correct the distribution of load data and retain the negative values in the gradient calculation process , which indirectly improves the retention of power load timing information in the network. The residual connection also solves the problem of network degradation caused by the gradual disappearance of information in the reverse transmission of network layers, so the dilated residual module enhances the fitting ability of the network itself to the load samples and improves the disaggregation accuracy of the model.

Feature enhancement module
The structure of the feature enhancement module proposed in this work is shown in Figure 4. The two input mappings of the internal structure of the null residual aggregated convolutional network are used as the input side of the feature reinforcement module, which are feature map1 and feature map2. Where feature map1 is the output feature map of the current null residual module, while feature map2 is the output feature map of the previous stage. As known from Figure 4, there are two optimization branches of the feature reinforcement module, the first branch includes operations such as Convolution, GlobalAvgPooling (GAP), FC, Sigmoid, Reshape, etc. The second branch contains operations such as tensor multiplication of feature maps, i.e., feature rescaling. Feature map1 and feature map2 are processed by the feature reinforcement module to obtain the output mapping feature map3.
The specific process of the enhancement module in loading information is shown in Figure 5. In Figure 5, feature map2 is the previous stage feature mapping, so it is not consistent with the dimensionality of the output mapping feature map1 of the null residual module, and the dimensionality of feature map2 is C. The number of convolution kernels of the convolution is 1 × 1×C. The convolution operation of this 1 × 1 convolution makes the two dimensions consistent, which is convenient for subsequent processing. The weight vector1 (weight vector1 in the figure) is obtained after the GlobalAvgPooling layer operation, which compresses the pre-multi-dimensional load feature map to a onedimensional feature map. Weight vector1 is essentially a onedimensional vector containing low-order load feature information, which characterizes the global information on the feature layer, and its dimension is 1×C. weight vector1 is then processed by FC layer, Relu non-linear activation function to obtain weight vector2 (weight vector2 in the figure), weight vector2 is a high-dimensional vector of higher-order global features obtained on the basis of weight vector1, whose dimension is also 1×C. Through this series of operations, the weight vector2 further represents the change in dimensional response of residential housing load characteristics as described as follows: Where F1 is the feature mapping of feature map1, GAP(⋅) is global average pooling, FC(⋅) is fully connection. W 1 and W 2 is the weight vector 1 and weight vector 2, respectively.
Sigmoid non-linear activation processing is similar to a gated filtering mechanism to achieve a filtering function on load information. Sigmoid processes each feature of the weight vector2 to generate a different weight variable. When the load feature of a channel is more effective, its corresponding weight variable is closer to 1; when the load feature of a channel is invalid, its corresponding weight variable is closer to 0. In this way the feature reinforcement module filters the useless information effectively. Then after Reshape operation to complete the dimensional change, the weight vector2 processed by the above operation is multiplied with feature map2 to complete the feature rescaling of feature map2, and finally the reinforced output mapping feature map3 is obtained, the specific calculation is as Eq. 4.
Where F 3 is the feature mapping of feature map3, Sigmoid(⋅) is the Sigmoid fuction. The main role of the feature enhancement module is to integrate and optimize the features of the output mapping feature map1 of the dilated residual module and the output load feature mapping feature map2 of the previous part to complete the rescaling of the weights. This module makes use of the feature maps obtained at different stages of the network model, thus enabling the discrimination of the importance of the load features. The feature reinforcement module of the null residual aggregated convolutional network learns the dependencies of load timing information through the weight vector weight vector1 and weight vector weight vector2 and learns the importance of each load feature in the network accurately. Feature mappings that are favorable to the load decomposition task are given biased weight vectors, which serve to improve the decomposition accuracy, and conversely are given biased weight vectors that are small, which suppress irrelevant load features and achieve filtering of invalid information.
In summary, the null residual aggregated convolutional network model improves the decomposition accuracy of low-use appliances by completing the rescaling operation of load features through the feature reinforcement module.

Information fusion module
The structure of the information fusion module proposed in this work is shown in Figure 6. In total, it is composed of Add Structure diagram of feature enhancement module.

FIGURE 5
Schematic diagram of the process of the feature enhancement module.

FIGURE 6
Information fusion module structure diagram. operation, 3 × 1 Conv, Leaky-Relu non-linear activation function, and Batch Normalization. The output feature mapping of the null residual module is different from the output feature mapping of the feature reinforcement module, so the two parts are fused using the information fusion module. The dimensionality of feature map1 and feature map2 of Figure 6 is made consistent using the base convolution, and the dual input feature mappings are fused by Add operation, and feature re-extraction is performed using 3× 1 convolution. After that, the load feature mapping is further adjusted and optimized using Leaky-Relu activation function, Batch Normalization and other operations. Finally the output feature mapping of the information fusion module is a one-dimensional vector, and then the prediction results of residential load devices are obtained by the full join operation.
The null residual aggregated convolutional network uses multiple null residual blocks to extract the electric load features of residential houses, which improves the network model's ability to encode and decode load information, and also uses residual connections to transfer feature information to further ensure the effectiveness of load transfer between layers and avoid problems such as gradient disappearance of the network model. The nonintrusive load decomposition based on DRAnet uses the combined power data X as input samples and the power data Y of individual load devices as output labels to train and tune the network model. When tested on the test set, the decomposition results will be predicted based on the mapping relationship between X and Y during the training of the network. The model's feature enhancement module and information fusion module achieve optimal integration of load features extracted at different stages, thus improving the accuracy of the network model for load decomposition applications, especially improving the decomposition and identification of lowuse appliances.

Data sets
In this paper, two datasets [UK-DALE public data set and WikiEnergy data set (Kelly and Knottenbelt, 2015)] are used to verify the algorithm. WikiEnergy data set is a research power data set released by Pecan Street, and it is the most abundant residential power energy database in the world to study power load decomposition. It contains power data collected by nearly 600 household users over a period of time, including single load and total household power consumption. The active power of all loads and residential buildings is obtained at the sampling frequency of 1/60 Hz. The collection of power data began in 2011, but it has not stopped. The database is still expanding, providing a good data support for the research of non-intrusive load decomposition. The UK-DALE public data mainly contains the information of single load and total household power consumption of five household users. The number of load devices in each household is up to 9, but the sampling period of each household is different. In the experimental, 70% of the sequences are used for training and 30% for testing. In deep learning, artificially adding some noisy "dummy samples" combined with real data is beneficial to improve the robustness of the model and the generalization performance of the model. In this work, differential processing is performed on the raw power data. The load power signal is essentially a set of time-varying data, similar to a set of linear time-series information, and the total power samples are subjected to first-order differential processing, where the differential signal is a representation of the difference between two data. After the processing, the performance state of each non-zero-valued load device in the time dimension changes. Main purposes: 1) Each non-zero value power is changed to eliminate data fluctuation of load power signal and make the data tend to smoothness. 2) Combining raw data and differential data as the data input side enhances the data expression capability. The equation for differential processing is shown as follow Equation.
where X t represents the instantaneous total power data at the current moment t, X t−1 represents the instantaneous total power data at the moment point t − 1, ΔX t is the result of differential processing.

Analysis of experimental results for the WikiEnergy dataset
For the network model proposed in this work, experimental simulations and analyses are performed using several load devices, namely, air conditioner, refrigerator, microwave oven, washing machine, and dishwasher in WikiEnergy data set. DRAnet conducts non-invasive load decomposition experiments on the corresponding load devices on the WikiEnergy dataset, and compares K-Nearest Neighbor (KNN) algorithm, factorial hidden Markov model (FHMM), Denoising AutoEncoder (DAE) algorithm, CNN Sequence to Sequence (CNN s-s) algorithm, and CNN Sequence to Point (CNN s-p) algorithm, composite deep long short-term memory network (CD-LSTM) (Xia et al., 2020), and deep dilated residual network (D-ResNet). In this work, the input size of the network is 100. For the sequence-to-sequence method, the output size is 100. For the sequence to point method, the output size is 1. The average absolute error (MAE) and the comprehensive absolute error (SAE) are used to evaluate the performance of the algorithm. Table 1 and Figure 8 shows the comparison between the load decomposition effects of the above algorithms and the real power data on the WikiEnergy dataset.
The experimental results show that the KNN algorithm has the worst load decomposition effect on the following loads, such as microwave oven, washing machine and dishwasher. Because these loads have the characteristics of low frequency use, the KNN algorithm cannot effectively identify and decompose the sudden change point of load power due to its own algorithm structure. All algorithms can effectively decompose the air conditioning load with periodic laws. From the decomposition results of refrigerators, it can be found that D-Resnet algorithm and DRAnet network model proposed in this paper can be decomposed better, mainly because they can accurately identify the peak area of load power. In fact, for load decomposition, the most important thing is to improve the decomposition ability at high power consumption. For the moment of very low power, although all methods are different, they have little impact on practical applications. Therefore, effective decomposition of load peaks is particularly important. The experiment shows that the DAE algorithm has certain advantages in identifying and decomposing regions with zero power consumption. For low frequency load equipment such as microwave ovens, washing machines and dishwashers, CNN s-s and CNN s-p convolutional neural networks can not accurately realize power decomposition. In terms of MAE and SAE, decomposition errors are relatively large. The main reason is that the number of layers of these two network models is small, and the load feature extraction is insufficient. D-ResNet's performance is better than CNN s-s and CNN s-p, but it still cannot accurately realize the decomposition. That is because, although the residual structure makes the network deeper and improves the feature extraction ability, the feature extraction of electrical appliances with less frequency of use is still insufficient. Due to the structural advantages of the model, DRAnet uses dilated convolution to deepen the receptive field of convolution kernel, capture more time series information of fine load characteristics, and improve the decomposition effect. Compared with the existing decomposition algorithms, the DRAnet network model has better decomposition effect and better decomposition performance. In particular, the load decomposition curve is closer to the real power consumption curve on the load of microwave oven, washing machine, dishwasher and other low-frequency use. After the load is decomposed, the start and stop status of the electrical appliances can be distinguished by the threshold value.
The threshold values of five kinds of appliances are: air conditioner 100 W, refrigerator 50 W, washing machine 20 W, microwave oven 200 W, dishwasher 100 W. Table 2 shows the comparison of the evaluation indexes of load operation status after load decomposition by each algorithm. In Precision and Accuracy indexs, the DRAnet network model achieves the best decomposition performance in each appliance. In Recall metrics, good metrics performance was also achieved on Air Wither, Refrigerator, Washing Machine and Dishwasher. For CNN s-s, CNN s-p, DAE, CD-LSTM and D-ResNet, combining the decomposition results of microwave oven, washing machine and dishwasher in Figure 7 with the load operation state metrics Recall and Precision analysis, the actual power consumption of these three loads is significantly less in the proportion of samples in the on state compared to the other two loads. Therefore, from the indexes, for these low-frequency use load devices, none of them can accurately identify the operating state of the load switch. The   best load identification is the network model proposed in this work, which is most capable of accurately predicting the operating state of such load switches. In general, the power of the refrigerator is relatively regular, that is, it will stay at a relatively high power position for a period of time after starting. Although there is a power peak during the period, this does not affect the judgment of the start and stop of the electric appliance.

Analysis of experimental results for the UK-DALE dataset
In order to verify the generalization performance of the network model in this paper, relevant comparative experiments were carried out in UK-DALE. In the electric power data, five typical loads of kettle, refrigerator, microwave oven, washing machine and dishwasher are selected for experiment and analysis. As shown in Table 3, the above five algorithms can achieve effective power decomposition for frequently used load equipment such as air conditioners and refrigerators. For loads such as washing machines, microwave ovens and dishwashers, KNN algorithm, FHMM algorithm, CNN s-s algorithm and CNN s-p algorithm are far less effective than CD-LSTM algorithm, D-ResNet algorithm and DRAnet network model. This is mainly due to the advantages of the proposed network structure, which can better detect the peak state of power consumption and the operating state change of load switch. Figure 8 shows the local load decomposition of UK-DALE data samples. Observe the decomposition curves of other methods. It is not well represented in the peak area of load power consumption, and the curve has some burrs. But it performs well in the range of power close to zero. To sum up the four local decomposition renderings, the load decomposition result of DRAnet network model is the closest to the real power consumption compared with other algorithms, indicating that DRAnet decomposition performance is superior to other algorithms.
The decomposition performance of the algorithm is further evaluated by using the evaluation indexes of load start and stop operation states such as recall, accuracy and precision. As shown in Table 4, it is the comparison of load switch operation status indicators of five algorithms in UK-DALE data set. From the index point of view, DRAnet network model achieves the best numerical performance on these types of loads, and can accurately identify the operation status of load startup and shutdown. From the Recall index and Precision index, other algorithms perform well on kettles, dishwashers and refrigerators, and can also judge the start and stop status of electrical appliances, but they do not perform very well on other microwave ovens and washing machines. The accuracy of noninvasive load decomposition based on DRAnet is obviously superior to other methods.
On the two experimental results of WikiEnergy dataset and UK-DALE dataset, the dilated residual aggregated convolutional network based on its own structural advantages, the proposed dilated residual module enhances the network's ability to extract low-frequency load features, rescale the features using the feature enhancement module, filter the useless information, and reinforce the useful load features, thus to be have better decomposition effect and generalization performance than other methods.

Summary
This work firstly introduces the overall structure of the dilated residual aggregated convolutional network, which mainly has the differential processing of data, the dilated residual module, the feature enhancement module, and the information fusion module. The differential data enhances the expressiveness of the data and improves the robustness of the network model, and then the null residual module, feature reinforcement module, and information fusion module are proposed for the characteristics of sparse features of low-usage load devices. In this work, the dilated residual aggregated convolutional network is trained and tested on the WikiEnergy dataset and UK-DALE dataset samples. Through experimental simulations and result analysis, the proposed method is significantly better than other methods. In terms of load sequence decomposition, the method proposed in this paper has significantly improved on MAE and SAE indicators compared with the existing deep learning decomposition method, and in terms of electrical start and stop judgment, the method in this paper is superior to the existing method in three indicators. The fundamental reason is mainly that the dilated residual aggregated convolutional network model has a stronger extraction capability for higher order load features, and therefore has better decomposition results for low usage appliances. However, there are still many problems in nonintrusive load decomposition that need further study. 1) In order to improve the decomposition accuracy of load equipment, it is often to train the corresponding model for each load equipment. The process is complex and the time cost is high. Therefore, we can further study the deep neural network model with adaptive learning ability, such as the confrontation network model, which can be transplanted to the application of non-invasive load decomposition.
2) The decomposition of low power electrical appliances is more susceptible to noise interference and is not easy to decompose. The decomposition of low power loads needs further research. 3) The non-intrusive load decomposition algorithm based on deep learning in this paper has certain requirements for computing resources, and cannot be integrated with smart meters at present. In the future research work, we can consider further improvement of the algorithm, so that the algorithm can be directly used on embedded platforms and other hardware devices.