You're viewing our updated article page. If you need more time to adjust, you can return to the old layout.

ORIGINAL RESEARCH article

Front. Energy Res., 19 September 2022

Sec. Smart Grids

Volume 10 - 2022 | https://doi.org/10.3389/fenrg.2022.987982

An improved selective ensemble learning approach in enabling load classification considering base classifier redundancy and class imbalance

    SW

    Shiqian Wang 1

    DH

    Ding Han 1

    YH

    Yuanpeng Hua 1

    YW

    Yuanyuan Wang 1

    LW

    Lei Wang 2

    YL

    Yang Liu 2*

  • 1. Economic Research Institute, State Grid Henan Electric Power Company, Zhengzhou, China

  • 2. College of Electrical Engineering, Sichuan University, Chengdu, China

Article metrics

View details

3

Citations

1,6k

Views

460

Downloads

Abstract

In modern power systems, analyzing the behaviors of the end users can help to improve the system’s security, stability, and economy. Load classification provides an efficient way to implement awareness of the user’s behaviors. However, due to the development of data collection, transmission, and storage technologies, the volumes of the load data keep increasing. Meanwhile, the structure and knowledge hidden in the data become ever more complicated. Therefore, the parallelized ensemble learning method has been widely employed in recent load classification research. Although the positive performance of ensemble learning has been proven, two critical issues remain: class imbalance and base classifier redundancy. These issues raise challenges of improving the classification accuracy and saving computational resources. Therefore, to solve the issues, this article presents an improved selective ensemble learning approach to enable load classification considering base classifier redundancy and class imbalance. First, a Gaussian SMOTE based on density clustering (GSDC) is introduced to handle the class imbalance, which aims to achieve higher classification accuracy. Second, the classifier pruning strategy and the optimization strategy of the ensemble learning are further introduced to handle the base classifier redundancy. The experimental results indicate that when combined with the popular classifiers, the presented approach shows effectiveness for serving the load classification tasks.

1 Introduction

Along with the evolution of the power system, brand new techniques and features have been introduced (e.g., renewable energies, energy storage, and various user demands), which all impact the operations of the system. These points significantly increase the difficulties of the resource dispatch of the power system and this may lead to security, stability, and economy issues. It has been proven that on the user side, guiding the load of the users according to their power consumption behaviors to participate in power system dispatch could be an effective way of relieving these difficulties (Muthirayan et al., 2000; Aderibole et al., 2019; Wei et al., 2022). Therefore, to accurately and efficiently identify the user’s behaviors based on the load dataset has become a significant challenge (Zhu et al., 2020; Zhu et al., 2021). A number of researchers have suggested that load classification shows enormous potential to implement the user behavior awareness task (Zhang et al., 2015; Zhu et al., 2020; Liu et al., 2021).

Tambunan et al. (2020) present an improved k-means clustering algorithm, which is able to classify the load dataset based on the concept of clustering. Although their algorithm improves the stability of the traditional k-means, flaws still exist (e.g., the difficulty of determining the number of the initial centroids). Zhou and Yang (2012) present a self-adaptive fuzzy c-means algorithm to implement the load clustering and the authors claim that local optimal issue could be partially solved. Shi et al. (2019) present a deep learning and multi-dimensional fuzzy c-means clustering based load classification approach. Their experimental results show that this approach can provide satisfactory performances of dimension reduction, feature extraction, algorithm stability, algorithm efficiency, and so on. Zhang et al. (2020) present a Gaussian mixture model and multi-dimensional scaling analysis that is based on the load classification approach. The authors also report that the computational efficiency can be improved, while the computational cost can be reduced. However, although these studies contribute to our understanding of load classification, their methodologies are mainly based on distance-based clustering algorithms that lack of the ability of revealing the correlated features in the high-dimensional load data. Additionally, the presented algorithms have a serial algorithm architecture, which has limited capacity for serving the current large-volume load data in terms of efficiency. Therefore, to further improve the classification accuracy and processing efficiency of large-volume load data, supervised machine learning algorithms and the distributed computing technologies are widely employed in load classification research (Liu et al., 2019; Li et al., 2020; Tang et al., 2020; Wang et al., 2021). Among the supervised learning algorithms, artificial neural networks show remarkable performance and almost dominate the recent classification studies. Liu et al. (2019) employ the back propagation neural network as an underlying algorithm to achieve better load classification accuracy. To highlight the time series characteristics of the load data, the long short-term memory neural network is adopted to implement the classification in these studies (Li et al., 2020; Tang et al., 2020; Wang et al., 2021). Zhang et al. (2022) employ bi-directional temporal convolutional network and data augmentation to achieve high-accurate load classification. These authors supply great load classification in terms of accuracy. However, the authors still report that low efficiency issue occurs when the algorithms are dealing with the large-volume load data due to the algorithm overhead. As a result, Liu et al. (2016), Liu et al. (2017), and Liu et al. (2020) finally introduce the distributed computing to improve the efficiency of the large-scale load data classification. The authors report that because of the difficulties in the algorithm decoupling, the ensemble learning technology is a necessary tool to implement algorithm parallelization. This idea has also been proven by a number of researches (Liu et al., 2019; Li et al., 2020; Liu et al., 2016; Liu et al., 2017; Liu et al., 2020). Ensemble learning is able to create a number of parallel base classifiers, which facilitates the parallelization of the classification algorithm. However, among the base classifiers, the redundancy issue is inevitable (Liu et al., 2021; Wang et al., 2022). This point further causes the base classifier homogenization issue, which deteriorates the performance of ensemble learning and the final classification in terms of computational resource consumption and accuracy.

Class imbalance is another critical issue that impacts supervised classification algorithms. Due to imbalanced class distribution, the majority class may overwhelm the minority class and this causes imbalanced insufficient training. Therefore, the final classification accuracy may be severely affected. However, because of various user power consumption behaviors, the class imbalance issue naturally exists in the load data (Liu et al., 2019; Zhang et al., 2022). Consequently, a number of researchers have presented solutions, among which oversampling is considered to be the most effective. Liu et al. (2019) adopt the SMOTE algorithm to balance the classes of the load data, and effectively synthesized samples belonging to the minority class. Li et al. (2020) improve the traditional SMOTE and presents the Borderline-SMOTE algorithm, and successfully highlighted the borderline of the classes. Liu et al. (2020) present an improved BS algorithm considering the ratio of the sample synthesis, which also shows effectiveness of balancing the class distribution. However, it should be noted that the basic concept of these studies is based on stochastic oversampling. Their most crucial drawback is that stochastic sampling may not accurately simulate the real sample distribution of the original load data. As a result, the side effect (for example) of the class overlapping may seriously impact the generalization of the classifier, which may finally deteriorate the classification accuracy.

Motivated by the previous studies, this article initially presents a GSDC approach to solve the class imbalance issue. GSDC first constructs a directly density-reachable graph using density clustering. The algorithm then uses the shortest weighted graph path between the sample and the cluster centroid to form the sampling path to synthesize the minority samples. Then, the oversampling with the Gaussian stochastic perturbation is employed to enhance the diversities of the synthesized samples. This article will then present a fuzzy increment of diversity (FID) based clustering pruning strategy (CPS) to solve the base classifier redundancy issue. In this strategy, the FID eigenvector of each base classifier is firsts constructed. The FID characteristic matrix of all the base classifiers is then constructed. The affinity propagation clustering algorithm is then applied on the matrix to achieve the clusters and the corresponding centroids of the base classifiers. Based on two presented indices, the pruning strategy is implemented on the clusters. This finally leads us to achieve an optimal number of the base classifiers. To further maintain the diversity and accuracy of the redundancy eliminated base classifiers, a surrogate empirical risk with regular term-based optimization selection integration (OSI) composed of the surrogate empirical risk function, Huber function, and K-fold cross validation method is presented. Ultimately, combined with the popular classifiers, the performance of the presented class balancing algorithm and the improved selective ensemble learning algorithm are evaluated and validated.

The rest of this article is organized as follows. Section 2 presents the class balancing algorithm. Section 3 presents the improved selective ensemble learning algorithm. Section 4 shows the experimental results and discussions. Finally, Section 5 concludes this study.

2 Class balancing using GSDC

The class imbalance issue naturally exists in the load dataset, which increases the difficulties of minority class identification in the classifier. Although the stochastic oversampling algorithms can handle this issue to some extent, the flaws, for example, of the class overlapping and inaccurate sample distribution may deteriorate the performance of the classifier. Therefore, this article presents the GSDC algorithm to solve the flaws and improve the performance of the traditional SMOTE algorithm. It should be noted that there is currently no numerical definition of the concept minority class. Therefore, according to Liu et al.(2019), a threshold of 20% is employed to identify if a class is a minority class. If the number of the samples in a class is less than 20% of those in a class with the largest number of samples, then class is identified as a minority class.

2.1 Basic definitions in GSDC

  • 1) -neighborhood: Let denote a cluster; denote a sample in ; denote another sample in ; and denote the neighborhood radius of . Therefore, -neighborhood can be defined by Eq. 1:

  • 2) Core: For a given sample , if there are at least a number of samples locating in its -neighborhood, then is regarded as a core.

  • 3) Directly density-reachable: For two given samples and , if is a core and satisfies , then is regarded as directly density-reachable to .

  • 4) Directly density-reachable graph: Let denote the set of all the directly density-reachable samples in and denote the set of edges, in each of which is a weighted graph path between a directly density-reachable sample and its core. The Euclidean distance between samples is employed as the weight. Therefore, is the direct density-reachable graph for the cluster with parameters and .

2.2 Detailed steps of GSDC

  • Step 1, identify the minority sample and class. A given load dataset D is composed of samples belonged to a number of M classes . If the number of samples in a class is smaller than 20% of the number of samples in the class which contains the largest number of samples, then is regarded as the minority class. The samples in are regarded as the minority samples.

  • Step 2, clustering of the minority samples. Let denote a minority set and denote the number of clusters in . The DBSCAN clustering algorithm (Ester et al., 1996) is applied on . Therefore, a number of clusters can be achieved. In addition, the centroids of the clusters can be achieved.

  • Step 3, direct density-reachable graph construction based on clusters. Based on the clustering results in Step 2, the directly density-reachable graph can be achieved according to Section 2.1. In this article, the values of and are 10 and 3, respectively, according to the experiments based on the enumeration method.

  • Step 4, determine the number of synthesized samples for each . Compute the proportion of the sample distribution for each cluster. Then, according to the proportion, the synthesized samples can be generated.

  • Step 5, search of the sampling path. In each sample synthetic operation, a real sample is randomly selected from . The Dijkstra algorithm (Xu et al., 2007) is then employed to search the shortest weighted graph path between and the centroid in . , , and represent the samples that the path passes through. represents directly density-reachable. As a result, can be regarded as the sampling path.

  • Step 6, sample synthetic. A directly density-reachable edge is randomly selected from as the sampling interval. In the sampling interval, an interpolation distance that is subject to the uniform distribution is employed, as shown in Eq. 2:

Then, randomly generate the interpolation coordinates

shown in

Eq. 3

:

Afterward, to improve the diversities of the synthesized samples, a random disturbance vector

is added to

.

subjects to the normal distribution, as shown in

Eq. 4

:

where

represents the relative standard deviation. Finally, one synthetic sample can be generated, which is presented by

Eq. 5

:

Keep synthesizing the samples until the number of the samples in the minority class reaches to 20% of those in a class with the largest number of samples, the algorithm terminates.

The entire process of GSDC in enabling class balance of the load dataset is shown in Figure 1.

FIGURE 1

FIGURE 1

The entire process of GSDC in enabling class balance of the load dataset.

3 Improved selective ensemble learning

The essential method of the ensemble learning is based on one concept that a series of weak classifiers (base classifiers) are able to compose one strong classifier. The performance of the ensemble learning is depending on the diversity and the decision accuracy of the base classifier (Kuncheva and Whitaker, 2003; Yang et al., 2014). The diversity refers to the trend that the classifiers generate diverse misclassification of the samples, while the decision accuracy refers to the correct classification of the samples. It is obvious that along with the increasing scale of the base classifiers, the homogenization of the classifiers is inevitable. This point significantly deteriorates the diversity of the classifiers and finally causes the base classifier redundancy issue.

Therefore, to balance the diversity and accuracy of the classifiers, this article presents a fuzzy increment of diversity (FID) based clustering pruning strategy (CPS) and a surrogate empirical risk with regular term-based optimization selection integration (OSI) to implement the improved selective ensemble learning which finally serves the load classification and the identification of the load behaviors.

3.1 Clustering pruning strategy

The presented fuzzy increment of diversity (FID) based clustering pruning strategy (CPS) first constructs the FID eigenvectors and the FID characteristic matrix for the base classifiers. The affinity propagation (AP) clustering algorithm is then applied on the matrix (Gan and Ng, 2014). According to the Euclidean distance-based and cosine distance-based measurement indices, the optimal centroids of the base classifiers can be achieved from the clustered clusters. This finally leads to the pruning of the redundancy base classifiers.

3.1.1 Eigenvector of FID

Q-statistics are employed to construct the FID eigenvector (Kuncheva and Whitaker, 2003). Q-statistics are able to measure the decision diversity between two base classifiers. The Q-statistic of the base classifiers u and v for classifying the mth class of the load data can be represented by Eq. 6:where , , , and are subject to the joint distribution shown in Table 1.

TABLE 1

h v (xk) = yk h v (xk)≠yk
h u (xk) = yk a u,v b u,v
h u (xk)≠yk c u,v d u,v

Joint distribution for the two base classifiers.

In Table 1, and represent the classification results for the training sample using the base classifiers u and v, respectively; represents the class label of the training sample ; and represent the probabilities of <correct, correct> and <incorrect, incorrect> of classifying the training dataset using the base classifiers u and v respectively; and and represent the probabilities of <correct, incorrect> and <incorrect, correct> of classifying the training dataset using the base classifiers u and v, respectively. Therefore, according to Eqs. 5 and 6, the sum of pair-wise diversity index of a number of base classifiers can be represented by Eq. 7:To delineate the impact of an individual base classifier on the sum of pair-wise diversity among all the base classifiers, the FID of the base classifier u in the mth class of the training dataset is defined using Eq.8:where and represent the sets of all the base classifiers including and excluding the base classifier u. Therefore, the FID characteristic matrix for the base classifiers can be represented by Eq. 9:

3.1.2 Optimal number of centroids for base classifiers

The Euclidean distance and the cosine distance are frequently employed to measure the similarity between two data sequences. Based on the FID eigenvectors of all the base classifiers, the AP clustering algorithm is applied on the rows of to generate a number of clusters. In each cluster, the mean Euclidean distance and the mean cosine distance between the centroid and certain FID eigenvector are then computed. This article presents the Euclidean redundancy index and the cosine redundancy index to facilitate the identification of the optimal centroid number. Two indices are represented by Eqs 10 and 11:where represents the number of the centroids of the base classifiers. The larger or the smaller represents the greater diversity of the base classifiers in the cluster, and thus the redundancy of the base classifiers is regarded as lower. In the clustering processes, the optimal centroid number (the optimal number of the base classifier) can be achieved when the maximum and the minimum values of the indices are reached.

3.1.3 Steps of the presented CPS

  • Step 1: generate the base classifiers. In the load dataset , based on the samples and their corresponding labels, a number of base classifiers can be generated using sampling and training. Any existing sampling algorithms and the supervised machine learning algorithms can be adopted to implement this step.

  • Step 2: Based on the generated base classifiers and the load dataset , the Q-statistics of all the base classifier pairs are computed according to Eq. 6. Therefore, the FID eigenvectors of all the base classifiers can be achieved according to Eqs. 7 and 8. Finally, the characteristic matrix can be formed using Eq. 9.

  • Step 3: Cluster the base classifiers. The AP clustering algorithm is applied once on the row vectors of the characteristic matrix . The number of the centroids can be achieved.

  • Step 4: Cluster pruning of the base classifiers. Keep executing Step 3 and compute and using Eqs 10 and 11, until the inflection points of the two indices appear. Therefore, the optimal number of centroids can be achieved. The base classifiers corresponding to the centroids are selected as the final classifiers. The base classifiers corresponding to the other points of the clusters are eliminated as redundancy.

3.2 Surrogate empirical risk with regular term-based optimization selection integration

To improve the generalization of the presented improved selective ensemble learning, this article further presents the OSI strategy. This strategy introduces the concept of ensemble margin to construct the minimum surrogate empirical risk with a regular term function to optimize the weights assigned to the base classifier in ensemble leaning.

3.2.1 Maximum ensemble margin strategy considering model complexity

Ensemble margin (Yang et al., 2014) is adopted to measure the correct classification tendency of the samples. Let denote the verifying samples with labels; denote the number of the samples in ; denote the nth sample and its corresponding label; denote the set of the pruned base classifiers; and denote the classification results of using the base classifiers in to classify the . Therefore, the ensemble margin of to sample can be represented by Eq. 12:where denotes the weight of base classifier u in the ensemble learning and denotes the classification result using the base classifiers-based ensemble learning. If the classification result is correct, then , and otherwise . Based on the ensemble margin, the empirical risk function can be represented by Eq. 13:The presented OSI is able to improve the generalization of the classification model using the loss function. Furthermore, to control the complexity of the ensemble learning and reduce the overfitting caused by the optimization, this article also presents Eq. 14 considering the regular term in the weights of the base classifiers, which is an optimization problem:where ; regular term controls the complexity of the ensemble learning model; and is the equivalence factor.

3.2.2 Huber function based surrogate empirical risk function

The loss function in Eq. 14 is nonconvex and discontinuous, which results in difficulties of optimization. However, surrogate empirical risk function has been reported as a proper way of solving this issue. In this article, the truncated Huber function (Borah and Gupta, 2020) shown in Eq. 15 is employed as the surrogate empirical risk function. A factor is also adopted to tune the sensitivity of the surrogate empirical risk function to the outliers and noises. In the following experiments, the value of is set to 0.6.Finally, based on Eqs. 15 and 14 can be reformed into Eq. 16, which is ultimately employed to optimize the participating weights of the base classifiers:

3.2.3 K-fold cross validation method-based base classifier selection

K-fold cross validation method is adopted to achieve a number of K verifying datasets from the original labeled training dataset. Repeat the presented OSI strategy in each to finally generate a number of K-time optimized weights for , which is shown by Eq. 17:where denotes the optimized weight of the uth base classifier in the sth time OSI execution. Let denote a number of K-time optimized weights of the base classifier u in . Calculate the proportion of the times in which the weight is greater than 0 according to Eq. 18:where the value of the function is 1 when is greater than 0, otherwise the value of the function is −1. When , the corresponding base classifier is retained and will participate in the final majority voting-based ensemble learning for load classification.

3.3 Steps of the presented improved selective ensemble learning approach in enabling load classification

  • Step 1: A dataset consisting of the labeled samples is initially divided into a number of M sub-classes according to the labels . In each sub-class, the samples are randomly divided into the training dataset and the testing dataset with the ratio 4:6. In , the minority classes are processed by GSDC to balance the data distribution. Finally, merge all of the sub-training datasets and the testing datasets to achieve and .

  • Step 2: In , bootstrap sampling is carried out to generate a number of sub-datasets. The samples in the sub-datasets and their labels are input into a number of initiated classifiers. The Adam algorithm is further employed to optimize the loss function for each classifier. The early stop strategy is adopted to determine the number of the iterations of the classifier learning. Finally, a number of trained base classifiers can be achieved, and the set of the base classifier can be formed.

  • Step 3: Each base classifier in classifies , therefore the classification result can be achieved. Based on , the FID characteristic matrix can be constructed according to Eqs. 69.

  • Step 4: The presented CPS is then applied on . The AP clustering algorithm clusters the FID eigenvectors in of all the base classifiers. According to Eqs. 10 and 11, the optimal number of the centroids can be achieved based on the pruning of CPS. The corresponding retained base classifiers form a set .

  • Step 5: In the presented OSI phase, K-fold cross validation method is adopted. is randomly divided into a number of K equal parts according to the proportion of the classes, each part is represented by .

  • Step 6: Each base classifier in classifies . The classification result is represented by . According to Eqs. 1216, the weights of the base classifiers in can then be computed.

  • Step 7: Repeat Step 6 for K times. According to Eq. 17, the K-time weights of the base classifier can then be achieved.

  • Step 8: For each base classifier in (e.g., the base classifier ), according to Eq. 18 compute . If the value of is greater than 0.5, then the corresponding base classifier is retained and will participate the final majority voting-based ensemble learning for classifying .

4 Experimental results

4.1 The datasets employed to evaluate the presented approach

This article mainly employs three load datasets including the synthetic binary dataset, Electrical Grid Stability Simulated Dataset (EGSSD) (Arzamasov, 2018), and Electricity Load Diagrams 20112014 Dataset (ELDD) (Trindade, 2015). The samples in the synthetic binary dataset are labeled. The samples in EGSSD are also already labeled (system stability and system instability). In contrast, the samples in ELDD are not labeled. Therefore, the labels of the samples in ELDD can be achieved using the approach presented by Liu et al. (2019). The details of three datasets are listed in Table 2.

TABLE 2

Dataset No. of classes No. of samples Dimension
Synthetic binary 2 500 2
EGSSD 2 10,000 13
ELDD Implicit 370 140,256

Detailed information of the synthetic binary, EGSSD, and ELDD datasets.

The sampling interval for each sample in ELDD is 15 min. Therefore, in 1 day there are 96 sampling points in total. According to the sample dimension 140,256, each sample contains the load information for 1,461 days. In terms of analyzing the load data for 1 day, each sample in ELDD is converted into the daily load. As a result, the finally converted ELDD dataset contains samples, each of which has 96 dimensions.

4.2 Indices employed to evaluate the classification performance

Besides the accuracy Acc, which represents the overall classification accuracy of the samples is employed to evaluate the performance of the binary classification, the recall Pre and the precisions including Ppr, Gmeans, and Fvalue are also employed (López et al., 2013). Pre represents the proportion of the correctly classified minority samples. Ppr represents the real proportion of the minority samples in the samples that are classified as the minority samples. Gmeans represents the geometric mean of the proportion of the correctly classified samples in all majority classes and the proportion of the correctly classified samples in all minority classes. Gmeans presents the tendency of the classifiers of classifying different classes. If the value of Gmeans is close to the value of Acc, then the performance of the presented class balancing approach can be regarded as better. Fvalue represents the harmonic mean of Pre and Ppr. A greater value of Fvalue indicates that the improvements of classifying the minority classes generate less impact on classifying the majority classes.

Although the confusion matrix is frequently employed in multi-class classification evaluations, it is difficult to quantitatively assess the performance of the classification model. Therefore, based on the confusion matrix, this article presents the index named as the class confusion equilibrium entropy. The equations composing the index are presented as follows. First, the confusion matrix of binary classification can be denoted by Eq. 19:where and represent the number of samples correctly classified as positive and negative classes, respectively; and and represent the number of samples misclassified as positive class and the number of samples misclassified as negative class, respectively. In multi-class classification, the confusion matrix can be regarded as a combination of multiple binary confusion matrices. In the confusion matrix, the target class is treated as the positive class and the other classes are treated as the negative classes. We then define the harmonic average accuracy of the binary classification when the mth class is classified as the positive class using Eq. 20: is able to measure the class confuse level of the binary classification scenario. A smaller value of indicates a more severe confusion level. Based on Eq. 20, the class confusion equilibrium entropy is presented by Eq. 21:A greater value of Sb represents more equilibrium of the class confusion for the classifier, which also indicates the better class balancing performance of the presented GSDC algorithm.

4.3 Evaluation of GSDC

To evaluate the performance of the presented GSDC algorithm, this section employs the synthetic binary dataset, EGSSD dataset, and ELDD dataset. As aforementioned, the EGSSD dataset contains two classes and the ELDD dataset contains multiple classes.

4.3.1 Experiments using the synthetic binary dataset

The classification experiment is carried out using the synthetic binary dataset. The ratio of the minority class (in blue) and the majority class (in red) is 1:10. Support vector machine (SVM) is employed as the classifier.

Figure 2B shows that, based on the class balance using the presented GSDC algorithm, the sample distribution can be positively enhanced. The samples of the minority class can be significantly highlighted. Compared to the classification result without being processed by GSDC, as shown in Figure 2A, the hyperplane of SVM in Figure 2B is improved. Additionally, the minority samples in the area overlapping with the majority samples are not obviously affected by GSDC. Therefore, the presented class balancing strategy only has a limited influence on the classification of the majority samples, which demonstrates that GSDC can effectively synthesize the minority samples according to the sample distribution characteristic.

FIGURE 2

FIGURE 2

The classification (A) without processing by GSDC and (B) with processing by GSDC.

4.3.2 Experiments using the EGSSD dataset

First, the testing dataset is generated. In total, 2000 samples are randomly selected form the transient stability class and the transient instability class to form the testing dataset. Second, the training dataset is generated. A number of 4,000 transient stability samples and a number of 400 transient instability samples are also randomly selected to form the training dataset. The back propagation neural network (BPNN) classifier is employed in this section. In addition, the conventional SMOTE and BS class balancing algorithms are also implemented in terms of comparison. The classification results are listed in Table 3.

TABLE 3

Algorithm P re P pr F value G means A cc
Without balancing 0.6114 0.9862 0.7548 0.7778 0.8020
SMOTE 0.8431 0.9695 0.9019 0.9050 0.9084
BS 0.8826 0.9538 0.9168 0.9186 0.9197
GSDC 0.9015 0.9829 0.9404 0.9418 0.9425

Classification results based on the EGSSD dataset with different class balancing algorithms.

According to the results shown in Table 3, if the classification is carried out without class balancing, then due to the insufficient training of the minority class, the samples belonged to the minority class have higher chances to be misclassified. This results in higher but lower . In addition, the overall classification accuracy Acc is low. Based on the class balancing algorithms including SMOTE, BS, and GSDC, the classification accuracy Acc is significantly improved. In particular, the classification accuracy and the other indices based on GSDC outperform those of the other class balancing algorithms. The error between and Acc is only 0.0007, which means that GSDC can supply satisfactory class balancing performance. The highest value of indicates that GSDC has the smallest impact on the classification for the majority class. The evaluation suggests that GSDC has better global performances.

To evaluate the impact of the imbalance class proportion on the performance of GSDC, a series of the training datasets are generated. First, 4,000 samples belonged to the transient instability class are randomly selected. Then, based on the ratios of 20:1, 40:1, 80:1, and 160:1, the corresponding numbers of the samples belonged to the transient stability class are randomly selected. Therefore, four imbalanced training datasets can be achieved. The classification results are listed in Table 4.

TABLE 4

Imbalance ratio Without balancing SMOTE BS GSDC
20:1 0.7284 0.8875 0.9016 0.9038
40:1 0.6867 0.8041 0.8128 0.8297
80:1 0.5940 0.6898 0.7169 0.7633
160:1 0.5458 0.6582 0.6328 0.6604

lassification accuracy based on different imbalance ratios.

It can be observed that along with the increasing imbalance ratio, the classification accuracies based on different class balancing algorithms gradually deteriorate. This means that in the extremely imbalanced dataset, the balancing algorithm can supply limited improvement in terms of the classification accuracy. However, GSDC still outperforms the other algorithms.

4.3.3 Experiments using the ELDD dataset

The samples in ELDD are not labeled, which causes difficulty in their classification. Therefore, according to Liu et al. (2019), the labeling operation is applied on the dataset, and therefore the labeled dataset can be achieved. In terms of facilitating the experiments, a labeled subset that contains five classes and 16,620 samples is generated from the original ELDD. Then, is divided into the training dataset and the testing dataset in the ratio of 4:6. In the training dataset, the numbers of samples belonged to the five classes are 3,770, 1,502, 284, 320, and 818, respectively, of which the samples belonged to the third and the fourth classes are regarded as the minority samples. Afterward, based on GSDC, the imbalanced classes in are balanced to generate the balanced training dataset . BPNN is also employed as the classifier. The classification results based on different class balancing algorithms and different levels of noise are listed in Figures 3, 4. Additionally, white noise is employed in the following experiments. Noise level refers to the amplitude of the noise. Each noise sample is added to a training sample in . Therefore, the borders of the training samples are blurred, which is suitable to evaluate the class balancing ability of GSDC.

FIGURE 3

FIGURE 3

The classification accuracy based on different class balancing algorithms and different levels of noise.

FIGURE 4

FIGURE 4

The values of Sb based on different class balancing algorithms and different levels of noise.

From Figures 3, 4, it can be observed that when the noise level is low, with the improvements of the class balancing algorithms, the accuracy of the classification results is quite similar. However, along with the increasing noise level, especially when the level reaches 0.9, the accuracy Acc and values of Sb of the classification based on SMOTE and BS sharply decreased. In contrast, the accuracy Acc and values of Sb of the classification based on the presented GSDC still maintain higher levels. This point significantly suggests that GSDC has great abilities in terms of robust and noise immunity.

4.4 Evaluation of the improved selective ensemble learning approach

4.4.1 The parameters employed in the evaluation

The base classifiers employed in the evaluation include BPNN, classification and regression tree (CART), and long short-term memory neural network (LSTM). The performance of the presented improved selective ensemble learning approach is based on the classification performance of these base classifiers. First, according to step 2 in Section 3.3, a total of 100 labeled training sub-datasets are generated based on using bootstrapping (the bootstrapped number of samples equals to the sample number of the original dataset). Then, 100 BPNN base classifiers are trained using the training sub-datasets. Therefore, the FID characteristic matrix of these base classifiers can be achieved. The elements of the matrix are shown in Figure 5.

FIGURE 5

FIGURE 5

The FID characteristic matrix of 100 BPNN base classifiers.

According to step 4 in Section 3.3, the CPS strategy is then applied on the base classifiers. The redundancy removed set of base classifiers can be achieved.

Figures 6A,B indicate that when reaches 37, the indices and become roughly stable and monotonic. In this case, the centroids of the clusters are kept, as well as the corresponding base classifiers to form the redundancy removed set of the base classifiers . Afterward, OSI is applied further. To determine a proper value of in Eq. 14, this article exponentially increases the value of from 0.001 to 100. When the value of equivalence factor reaches 1, the weights of base classifiers become stable. Therefore, in the OSI phase, the value of is determined as 1.

FIGURE 6

FIGURE 6

The value variations of (A) IERI and (B) ICRI.

4.4.2 Performance evaluation of load classification using ELDD

According to steps 5 to 7 in Section 3.3, A five-fold cross validation is employed in this section. Repeat step 5 for five times, in each of which the weights of base classifiers in can be computed. The weight matrix can then be formed. According to step 8 in Section 3.3, the OSI phase finally retains nine base classifiers, which can be ultimately employed to classify based on the majority voting.

BPNN, CART, and LSTM algorithms are adopted as the base classifiers, on which the improved selective ensemble learning is applied. In terms of comparison, famous ensemble learning strategies, including bagging and adaboosting, are also implemented. Based on the presented approach, and other ensemble learning strategies, the classification results including Acc and Sb of classifying are listed in Tables 5, 6.

TABLE 5

Algorithm BPNN CART LSTM
Single classifier 0.8012 0.8376 0.7891
Bagging 0.9066 0.9198 0.8843
Adaboosting 0.9589 0.9662 0.9315
Presented approach 0.9653 0.9775 0.9420

omparisons of the accuracy of different ensemble learning strategies.

TABLE 6

Algorithm BPNN CART LSTM
Single classifier 1.5421 1.5518 1.5398
Bagging 1.5723 1.5756 1.5683
Adaboosting 1.5937 1.5964 1.5880
Presented approach 1.6092 1.6108 1.6059

omparisons of Sb of different ensemble learning strategies.

From Tables 5, 6, it can be observed that in terms of Acc and Sb, the presented approach outperforms the famous ensemble learning algorithms including bagging and adaboosting. In addition, the classification results suggest that the presented approach is able to serve different classifiers with significant performance improvement.

4.4.3 Stability evaluations of the improved selective ensemble learning

To demonstrate the stability of the presented improved selective ensemble learning based load classification, this section employs BPNN as the base classifier. GSDC is employed to balance the testing dataset of ELDD. Bagging is also implemented in terms of comparison. Both the classifications for the testing dataset using the presented improved selective ensemble learning based BPNN (9 base classifiers) and the bagging ensemble learning based BPNN (100 base classifiers) are carried out 300 times. The results are shown in Figure 7.

FIGURE 7

FIGURE 7

A comparison of the stability of two ensemble learning approaches.

Figure 7 first shows that in 300-time experiments, although more base classifiers involved in bagging, the improved selective ensemble learning based BPNN outperforms the bagging ensemble learning based BPNN in terms of classification accuracy. Second, the improved selective ensemble learning also performs a correspondingly stable performance. The accuracies of 300-time experiments are quite close. The results shown in Figure 7 prove that the presented selective ensemble learning can improve both the classification accuracy and the classification stability.

5 Conclusion

Class imbalance and low efficiency prevent load classification from being effectively carried out. Therefore, this article presents an improved selective ensemble learning approach to enable load classification considering base classifier redundancy and class imbalance. First, a Gaussian SMOTE based on density clustering is proposed. The minority samples can be effectively synthesized, mainly using sampling techniques, DBSCAN clustering algorithm, and Dijkstra algorithm. Therefore, the original dataset can be significantly balanced. Second, a fuzzy increment of diversity based clustering pruning strategy is further proposed. Based on FID characteristic matrix and AP clustering algorithm, the redundancy of the base classifiers can be discovered and removed. To improve the generalization of the classification model, the ensemble margin based empirical risk function, the Huber loss function, and the K-fold cross validation method-based optimization selection integration are proposed. According to the experimental results, the presented GSDC is able to effectively balance the classes, which finally leads to an improvement of the classification accuracy. The presented CPS and OSI strategies can also remove the redundancy of the base classifiers, which significantly improves the efficiency of the ensemble learning. All of the positive results indicate that the presented improved selective ensemble learning approach considering base classifier redundancy and class imbalance can be an effective tool to serve practical large-scale load classification tasks.

Statements

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, and further inquiries can be directed to the corresponding author.

Author contributions

The authors contributed their original works respectively for this article. SW and YL presented the basic idea of the article. They also presented the employed algorithms of the article. DH, YH, and LW implemented the algorithms and further presented the experiments of evaluating and validating the performances of the algorithms. YW organized and structures the article and finally finished the writing works.

Funding

The authors would like to appreciate the support from the State Grid Henan Economic Research Institute with the project “Big Data based Residential Load Data Identification, Analysis, and Power Consumption Management Research” under Grant No. 5217L021000C.

Conflict of interest

The authors would like to appreciate the support from the State Grid Henan Economic Research Institute with the project “Big Data based Residential Load Data Identification, Analysis, and Power Consumption Management Research” under Grant No. 5217L021000C.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

  • 1

    Aderibole A. Zeineldin H. H. Hosani M. A. El-Saadany E. F. (2019). Demand side management strategy for droop-based autonomous microgrids through voltage reduction. IEEE Trans. Energy Convers.34 (2), 878888. 10.1109/TEC.2018.2877750

  • 2

    Arzamasov V. (2018). Data from: Electrical Grid stability simulated data dataset. Orange County, California: Machine Learning Repository. Available at: http://archive.ics.uci.edu/ml/machine-learning-databases/00471/.

  • 3

    Borah P. Gupta D. (2020). Functional iterative approaches for solving support vector classification problems based on generalized Huber loss. Neural comput. Appl.32 (13), 92459265. 10.1007/s00521-019-04436-x

  • 4

    Ester M. Kriegel H-P. Sander J. Xu X. (1996). “A density-based algorithm for discovering clusters in large spatial databases with noise,” in The 2nd international conference on knowledge discovery and data mining (Portland, Oregon, USA: AAAI), 226231.

  • 5

    Gan G. Ng M. K. -P. (2014). Subspace clustering using affinity propagation. Pattern Recognit. DAGM.48, 14551464. 10.1016/j.patcog.2014.11.003

  • 6

    Kuncheva L. I. Whitaker C. J. (2003). Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn.51 (2), 181207. 10.1023/A:1022859003006

  • 7

    Li X. Wang P. Liu Y. Xu L. (2020). Massive load pattern identification method considering class imbalance. Proc. CSEE40 (01), 128137+380. 10.13334/j.0258-8013.pcsee.190098

  • 8

    Liu S. Reviriego P. HernÁndez J. A. Lombardi F. (2021). Voting margin: A scheme for error-tolerant k nearest neighbors classifiers for machine learning. IEEE Trans. Emerg. Top. Comput.9 (4), 20892098. 10.1109/TETC.2019.2963268

  • 9

    Liu W. (2021). “Cooling, heating and electric load forecasting for integrated energy systems based on CNN-LSTM,” in 2021 6th international conference on power and renewable energy (ICPRE) (Shanghai, China: IEEE). 10.1109/ICPRE52634.2021.9635244

  • 10

    Liu Y. Gao L. Liu L. (2020a). Parallel load type identification algorithm considering sample class imbalance. Power Syst. Technol.44 (11), 43104317. 10.13335/j.1000-3673.pst.2020.0116

  • 11

    Liu Y. Li X. Chen X. (2020b). High-performance machine learning for large-scale data classification considering class imbalance. Scientific Programming. 10.1155/2020/1953461

  • 12

    Liu Y. Liu Y. Xu L. Wang J. (2019). A high performance extraction method for massive user load typical characteristics considering data class imbalance. Proc. CSEE39 (14), 40934104. 10.13334/j.0258-8013.pcsee.181495

  • 13

    Liu Y. Ma C. Xu L. Shen X. Li M. Li P. (2017). MapReduce-based parallel GEP algorithm for efficient function mining in big data applications. Concurr. Comput. Pract. Exper.30, e4379. 10.1002/cpe.4379

  • 14

    Liu Y. Xu L. Li M. (2016). The parallelization of back propagation neural network in MapReduce and spark. Int. J. Parallel Program.45, 760779. 10.1007/s10766-016-0401-1

  • 15

    López V. Fernandez A. Garcia S. Palade V. Herrera F. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf. Sci.250, 113141. 10.1016/j.ins.2013.07.007

  • 16

    Muthirayan D. Kalathil D. Poolla K. Varaiya P. (2000). Mechanism design for demand response programs. IEEE Trans. Smart Grid11 (1), 6173. 10.1109/TSG.2019.2917396

  • 17

    Shi L. Zhou R. Zhang W. (2019). Load classification method using deep learning and multi-dimensional fuzzy C-means clustering. Proc. CSU-EPSA31 (7), 4350. 10.19635/j.cnki.csu-epsa.000089

  • 18

    Tambunan H. B. Barus D. H. Hartono J. Alam A. S. Nugraha D. A. Usman H. H. H. (2020). “Electrical peak load clustering analysis using K-means algorithm and silhouette coefficient,” in 2020 international conference on technology and policy in energy and electric power (ICT-PEP) (Bandung, Indonesia: IEEE). 10.1109/ICT-PEP50916.2020.9249773

  • 19

    Tang Z. Liu Y. Xu L. (2020). Imbalanced-load pattern extraction method based on frequency domain characteristics of load data and LSTM network. Electr. Power Constr.41 (8), 1724. 10.12204/j.issn.1000-7229.2020.08.003

  • 20

    Trindade A. (2015). Data from: ElectricityLoadDiagrams20112014 dataset. Orange County, California: Machine Learning Repository. Available at: http://archive.ics.uci.edu/ml/machine-learning-databases/00321/.

  • 21

    Wang L. Liu Y. Li W. Zhang J. Xu L. Xing Z. (2022). Two-stage power user classification method based on digital feature portraits of power consumption behavior. Electr. Power Constr.43 (2), 7080. 10.12204/j.issn.1000-7229.2022.02.009

  • 22

    Wang Z. Li H. Tang Z. Liu Y. (2021). User-level ultra-short-term load forecasting model based on optimal feature selection and bahdanau attention mechanism. J. Circuits, Syst. Comput.30. 10.1142/S0218126621502790

  • 23

    Wei Z. Ma X. Guo Y. (2022). Optimized operation of integrated energy system considering demand response under carbon trading mechanism. Electr. Power Constr.43 (1), 19. 10.12204/j.issn.1000-7229.2022.01.001

  • 24

    Xu M. H. Liu Y. Q. Huang Q. L. Zhang Y. Luan G. (2007). An improved dijkstra's shortest path algorithm for sparse network. Appl. Math. Comput.185 (1), 247254. 10.1016/j.amc.2006.06.094

  • 25

    Yang C. Yin X. C. Hao H. W. (2014). Classifier ensemble with diversity: Effectiveness analysis and ensemble optimization. Acta Autom. Sin.40 (4), 660674. 10.3724/SP.J.1004.2014.00660

  • 26

    Zhang J. Liu Y. Li W. Wang L. Xu L. (2022). Power load curve identification method based on two-phase data enhancement and Bi-directional deep residual TCN. Electr. Power Constr.43 (2), 8997. 10.12204/j.issn.1000-7229.2022.02.011

  • 27

    Zhang M. Li L. Yang X. (2020). A load classification method based on Gaussian mixture model clustering and multi-dimensional scaling analysis. Power Syst. Technol.44 (11), 42834296. 10.13335/j.1000-3673.pst.2019.1929

  • 28

    Zhang P. Wu X. Wang X. Bi S. (2015). Short-term load forecasting based on big data technologies. CSEE Power Energy Syst.1 (3), 5967. 10.17775/CSEEJPES.2015.00036

  • 29

    Zhou K. Yang S. (2012). An improved fuzzy C-Means algorithm for power load characteristics classification. Power Syst. Prot. Control40 (22), 5863. CNKI:SUN:JDQW.0.2012-22-013.

  • 30

    Zhu Q. Zheng H. Tang Z. (2021). Load scenario generation of integrated energy system using generative adversarial networks. Electr. Power Constr.42 (12), 18. 10.12204/j.issn.1000-7229.2021.12.001

  • 31

    Zhu T. Ai Q. He X. (2020). An overview of data-driven electricity consumption behavior analysis method and application. Power Syst. Technol.44 (9), 34973507. 10.13335/j.1000-3673.pst.2020.0226a

Summary

Keywords

load classification, ensemble learning, class imbalance, classifier redundancy, base classifier

Citation

Wang S, Han D, Hua Y, Wang Y, Wang L and Liu Y (2022) An improved selective ensemble learning approach in enabling load classification considering base classifier redundancy and class imbalance. Front. Energy Res. 10:987982. doi: 10.3389/fenrg.2022.987982

Received

06 July 2022

Accepted

25 July 2022

Published

19 September 2022

Volume

10 - 2022

Edited by

Yikui Liu, Stevens Institute of Technology, United States

Reviewed by

Anan Zhang, Southwest Petroleum University, China

Chunyi Huang, Shanghai Jiao Tong University, China

Updates

Copyright

*Correspondence: Yang Liu,

This article was submitted to Smart Grids, a section of the journal Frontiers in Energy Research

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics