This article was submitted to Smart Grids, a section of the journal Frontiers in Energy Research
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Deep learning-based transient stability assessment has achieved big success in power system analyses. However, it is still unclear how much of the data is superfluous and which samples are important for training. In this work, we introduce the latest technique from the artificial intelligence community to evaluate the significance of the samples used in deep learning model for the transient stability assessment. From empirical experiments, it is found that nearly 50% of the low-significance samples can be pruned without affecting the testing performance at the early training stages, thus saving much computational time and effort. We also observe that the samples with the fault-clearing time close to the critical clearing time often have higher significance indexes, indicating that the decision boundary learned by the deep network is highly related to the transient stability boundary. This is intuitive, but to the best of our knowledge, this work is the first to analyze the connection from sample significance aspects. In addition, we combine the stability scores with the significance index to provide an auxiliary criterion for the degree of stability, indicating the distance between a sample and the stability boundary. The ultimate goal of the study is to create a tool to generate and evaluate some benchmark datasets for the power system transient stability assessment analysis, so that various algorithms can be tested in a unified and standard platform like computer vision or natural language-processing fields.
Transient stability assessment (TSA) has always been an active research topic since transient instability is one of the major threats to a power system. Prior research on TSA can be roughly categorized into two aspects, i.e. the time-domain simulation and the direct method (
During the past decade, inspired by the tremendous success in computer vision and the natural language-processing field, researchers started to apply the deep learning techniques in power system TSA. Various network structures and algorithms have been applied in the TSA analysis, including the deep belief network (DBN) (
The artificial intelligence (AI) community has made some interesting attempts in this direction (
Following the technique proposed by
In the machine learning-based TSA analysis, it is often formulated as a supervised classification problem. The first step is to generate the dataset used for training and testing, denoted by
To reliably determine the stability status, the simulation duration is often set to 10 s or more. And the input feature is also high- dimensional. Therefore, the dataset generation stage is the most time- and computational resource-consuming step if the dataset size is large. It is necessary to seek the answer to the question of what is the nature of samples that can be removed from the training dataset without hurting the accuracy.
The following definition and derivation are mainly taken from (
In order to simplify the analysis, the discrete training iterations are approximated as continuous training dynamics. The loss change along the training iterations can then be represented as the time derivative of the loss function:
Then, if we remove any sample
Therefore, the contribution of a training sample to the loss change is bounded by
The GraNd score describes the contribution of a sample to the change in the training loss. Specifically, samples with a small GraNd score in expectation have a limited influence on the training process. Note that the opposite is not necessarily true since
Let
Taking
Substitute
Therefore, the GraNd score is:
The right part of
Therefore, an easy-to-compute criteria SSI is proposed to evaluate the upper bound of the contribution of any sample to the loss change of the neural network during the training process.
The stability degree of a given sample is usually measured by the difference between the fault-clearing time (CT) and critical-clearing time (CCT). However, there are some difficulties to get the CCT of complex systems, such as the accuracy and fast computation speed (
We standardize the index at first as:
In
The SSI and stability score could be calculated by the following three steps, as
The procedure to evaluate the significance and stability.
Step 1: Input the training dataset into a neural network of which the parameters of each epoch are recorded during the training. Then, calculate the output error which is propagated back and adjust the weights of the neuron. And the end of the training is marked by the accuracy rate or the number of iterations reaching the set standard.
Step 2: The neural network parameters of an iteration to calculate SSI are imported to the network again and input the samples to the neural network. The output and the labels of the training sample are calculated by
Step 3: The output of step 2’s neural network is used as the input of the stability assessment of the sample. And parameter
In the power system TSA tasks, the knowledge learned from machine learning algorithms is usually interpreted as the stability boundary (
The input feature includes the power angle and the rotor speed of the generator. 4,000 samples are randomly generated from the state space. A 4-layer multiple layer perceptron (MLP) is trained 200 epochs and the SSI are computed at each training epoch. The results are shown in
SSI at epoch 90 of the SMIB system.
Sample Significance Index distribution along the training epochs.
The stability scores are calculated according to
Stability Scores of the SIMB system samples.
The IEEE New England test system is used as the base case of the TSA task. The parameters of the test system are taken from (
Network structure and hyperparameters.
Network structure | Batch size | Learning_rate | Activation function | Loss function | Optimizer | |
---|---|---|---|---|---|---|
MLP | Dense (150) | 64 | Initial learning_rate = 0.01 decay_rate = 0.9 | Relu | Cross entropy | SGD |
Dense (60) | ||||||
Dense (2) | ||||||
CNN | Con(kernel_size (3,3), out_channel = 16) | 64 | ||||
Average_pooling | ||||||
Dense (128) | ||||||
Dense (10) | ||||||
Dense (2) | ||||||
LSTM | Lstm (hidden_size = 64) | 128 | ||||
Dense (2) |
The average SSI used to prune the dataset is calculated by training the three networks on the full dataset for 10 training runs which are performed with 200 epochs. By various experiments, we find that the SSI becomes stable after 80 epochs, which is used as the Sample Significance Index. Then, the samples with low significance indexes are removed from the training, and a new model is trained with the same network using random initializing weights. The performance of the new model is computed using the testing set, shown in the blue lines of
The influence of the pruning fraction and SSI of different training periods on the final accuracy.
In the meantime, the performance in the accuracy of the subsets pruned on epoch80 and epoch200 indexes is similar. To investigate how early the SSI are effective in training,
The loss is recorded when training the CNN using the full dataset and the subset pruned 50% samples based on SSI to explore the influence of pruning samples on the training dynamic process. And the loss surface called loss landscape is drawn along two directions from a center point, as shown in
Loss landscape visualizations of CNN trained with the full dataset
Time and Efficiency of subsets with different sizes.
MLP | CNN | LSTM | ||||
---|---|---|---|---|---|---|
Subset/FULL | Time (t/s) | Efficiency (acc/time) | Time (t/s) | Efficiency (acc/time) | Time (t/s) | Efficiency (acc/time) |
100% | 83.12 | 0.012 | 137.9 | 0.007 | 44.64 | 0.022 |
90% | 74.8 | 0.013 | 121.91 | 0.008 | 39.24 | 0.025 |
80% | 65.12 | 0.015 | 110.34 | 0.009 | 36.74 | 0.027 |
70% | 58.18 | 0.017 | 96.6 | 0.010 | 32.22 | 0.030 |
60% | 50.51 | 0.019 | 89.92 | 0.011 | 28.01 | 0.035 |
50% | 42.91 | 0.023 | 72.45 | 0.013 | 24.51 | 0.040 |
40% | 34.03 | 0.029 | 63.1 | 0.016 | 20.06 | 0.049 |
In the study of 4.1, the samples with high SSI played an important role in TSA. In order to study the influence of the range of the Sample Significance Index on stability assessment accuracy, the key is to analyze whether the samples with the highest indexes have the highest accuracy. We first sort the samples by ascending SSI. Then, we perform a sliding window analysis on these samples by training on a subset of which the SSI of the samples within the window is from percentile p to percentile p+40%. And the step of window sliding to higher SSI is 10% of the full dataset. In three networks, the results indicate that the performance increases as the window slides to higher percentiles, as shown in
Test accuracy of MLP, CNN, and LSTM trained on the full dataset (No pruning) and a 40% subset pruned randomly or by SSI calculated at epoch 50 or epoch 80. When using SSI, the sample window slides along the samples in the ascending order of SSI. Training of the first row
Since the sliding window contains 40% samples of the dataset, as shown in
Test accuracy of MLP trained on only 10% subset which slides along the samples in the ascending order of SSI.
In the process of training dataset generation, the stability labels of some samples are generated incorrectly due to the selection of transient stability criteria or errors in the calculation. These labels can be regarded as noisy labels which could affect the accuracy of the networks and sample SSI calculation. Therefore, 10% sample labels of the dataset are randomly selected for randomization. Subsets containing 40% samples of this dataset are generated for training. As
Distribution of stability labels and SSI. The dataset of
This part indicates that a set number of samples with higher SSI do not necessarily mean higher accuracy. And it may influence the formation of the decision boundary in the opposite way. And a fixed-length sliding window of the samples along the SSI provides us with an intuitive way to generate datasets which will be discussed in the next part.
In addition to the samples with high SSI that we care about, an interesting phenomenon, as shown in
What’s more inspired by the results of the SMIB system, we evaluate the relations between the SSI and the critical-clearing time, as shown in
Sample Significance Index distribution with relative clearing time.
In summary, this article evaluates the significance of the samples for the deep learning-based TSA. It is observed that a large amount of samples can be pruned at an early training stage without sacrificing test accuracy using the Sample Significance Index. In addition, we find that the samples with high significance scores tend to have a borderline fault-clearing time. This is intuitive. The samples with short or long fault-clearing times can be regarded as the white and black samples. It is always easy to separate the whites from the blacks. But the samples with borderline fault clearing times are “gray” samples, which are more important for learning the stability boundary and improving the training accuracy. Therefore, this article proposes a method to generate and prune the dataset based on the Sample Significance Index for TSA which provides some relief to data generation and training. And combining the stability score and the classification results of the networks, we get a more intuitive method to assess the transient stability of samples.
Future work includes how we can use SSI to understand the dynamic training process of models since we have already known the early training period and final results. More tests on datasets and power system topologies are also preferred to show the properties of the significance index. Ultimately, we want the SSI to be a part of the tool to generate and evaluate some benchmark datasets for power system TSA analyses so that various algorithms can be tested in a unified and standard platform like computer vision or natural language-processing fields.
The original contributions presented in the study are included in the article/
LZ: inspiration of algorithm, data analysis, manuscript writing, and reviewing. ZW: experiment implementation, manuscript reviewing, and editing. GL: supervision. YX: supervision.
This work was supported in part by the National Key R&D Program of China “Response-driven intelligent enhanced analysis and control for bulk power system stability” (2021YFB2400800).
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
The Supplementary Material for this article can be found online at: