Identification of Key Lines for Multi-Photovoltaic Power System Based on Improved PageRank Algorithm

In order to accurately identify the key lines in the photovoltaic (PV) grid-connected system, an identification method based on the improved PageRank algorithm is proposed. Firstly, the correlation matrix reflecting the electrical characteristics of the system is constructed using the line current-carrying rate, line breaking power flow transfer rate and line coupling rate, to replace the original network topology matrix. Secondly, through the entropy method, a comprehensive evaluation index based on electrical betweenness, load deviation rate and voltage shock rate is constructed to improve the distribution of the initial PageRank (PR) value of the PV grid-connected system. To study the changes’ impact of PVs active power outputs on the identification results of key lines in the Multi-PV power system, the HGWO-SVM (Hybrid Grey Wolves Optimized Support Vector Machine) algorithm was used to obtain the PVs daily outputs prediction curves and obtain fixed outputs of PVs at different periods, so as to study the impact of the variation of PV daily output on the key line identification. Taking the IEEE 39-node system containing multi-PV as an example, the identification results show that the improved PageRank algorithm is superior to the original method in line identification accuracy. The HGWO-SVM algorithm by adaptively modifying the cross operator and mutation operator also has a certain improvement in prediction accuracy. The changes of PVs daily outputs have different degree of influence on the line criticality (namely final PR value) during periods of high light intensity and other periods of light intensity.


INTRODUCTION
With the rapid development of power industry, large-scale interconnection of power systems has become the norm. However, with the continuous expansion of power grid, the application of UHV (ULTRA-High Voltage) technology, and the introduction of new energy sources such as wind power and PVs, the operating mechanism of the power system has become increasingly complex. The number of major blackouts, such as the "U.S.-Canada blackout," which caused by minor accidents and lead to extended chain failure scope of the large system, is increasing day by day. Once a major blackout occurs, it will bring extremely huge national economic losses (Xu et al., 2003). Therefore, studying the identification of key lines of the power system and monitoring the key lines can effectively prevent the occurrence of potential large-scale accidents caused by cascading failures and improve the stability of the power system (Yang et al., 2015). Literature (Yi-jia et al., 2006) considers the active power output of the generator and the weighted topology structure of power grid, assuming that the power between buses only flows along the shortest path, and proposes the weighted line betweenness for key line identification, which does not match the actual power flow. Literature (Wei et al., 2010) proposed that weighted line betweenness is used for key line identification, and at the same time, the line betweenness is corrected with the peak value of all the line betweenness in a small area as an index. Literature (He et al., 2013) puts forward a new index to evaluate the vulnerability of the system structure by combining the system operation status and the line betweenness with weighted reactance. Literature (Wen-ying et al., 2013) uses power flow tracking algorithm to calculate the utilization of generators and loads on the line, and uses power flow interface number to identify key lines. Literature , based on the topology and operating status of the power grid, comprehensively considers a variety of factors, and proposes a comprehensive betweenness to identify vulnerable lines. Literature (Bai and Hong-miao, 2015) defines the betweenness of the mixed power flow based on the actual power flow path to make the identification result more in line with the actual power grid. Literature (Chen et al., 2018) uses branch active power flow to define line efficiency weights, and establishes a comprehensive vulnerability assessment index for power grids by obtaining global efficiency changes to identify power grid vulnerability. Literature (Yu et al., 2018;Zeng et al., 2018) combine the entropy weight method and the analytic hierarchy process to assign the weight of the identification index set to obtain the comprehensive evaluation index. The identification result is more comprehensive than the single betweenness identification result. The above models are all based on the conventional grid topology and operating status, and use one or more indicators for identification. Literature (Chen et al., June 2007;Chu and Iu, 2017) propose to apply the complex network theory to the power system, and identify key lines based on the complex network theory. Literature  proposes a DKsPS identification method that considers the decomposition depth of the K-shell and the literature  uses an improved structural hole theory (ISH method) to consider the coupling relationship between lines. Literature (Ma et al., 2016;Ma et al., 2017;Ma et al., 2019) introduce the concept of hidden faults based on the consideration of the grid topology and operating status. The PageRank algorithm can consider the impact of hidden faults on line identification due to its own characteristics. The identification results are similar to the above based on multiple betweenness. The identification result is more in line with the actual power grid. Consider that new energy sources are connected to the grid, which will have an impact on the stability of the grid, which will affect line identification (Li et al., 2018). Literature (Ni et al., 2019) combines entropy weight method and analytic hierarchy process to establish key line comprehensive evaluation indicators based on the impact and disconnection consequences of the line to identify wind and solar grid-connected systems. Literature (Tao et al., 2020) uses the method of discretizing the probability distribution of output error to consider the output fluctuation of wind power gridconnected power, combining line load rate and power flow betweenness for identification. Literature  considers generator output capacity and load size as main factors, using comprehensive weighting method of game theory to identify importance of transmission line. Literature (Zeng et al., 2020) puts forward the line weakness index, calculate the weak line of the power grid considering the transient stability constraints. Literature (Zhang et al., 2020) establishes a comprehensive evaluation model of fragile lines that considers the relative structural fragility and state evaluation indicators, and realizes the classification of fragile lines from the two perspectives of grid topology and operating status.
Most of the above methods do not consider the new energy access to the system, or the method of considering the new energy grid connection only reflects the difference between whether the system is connected or not. For this reason, this paper considers the line current carrying rate under N-1 fault, the line breaking power transfer rate and the line coupling rate constitute the correlation index, and the electrical betweenness, load deviation rate and voltage impact rate constitute a comprehensive evaluation index. Using the HGWO-SVM algorithm to predict daily PVs output, to predict output more time interval, force analysis, by comparing the different key line identification result of the output time, at different times of the research on PVs output impact on line identification results.

Algorithm Identification
Google's classic algorithm -PageRank is originally used to rank the importance of the page. In the circuit identification used in the power system, it is necessary to convert the research object from the page node to the line between the two nodes, taking into account the network topology and the line. Electrical connection. The PageRank algorithm considers that the jump of web page browsing is random and average, the PR value of each node is transferred to the next pointing node on average. Due to the characteristics of the power grid, the degree of correlation between the lines is different. Therefore, the distribution of PR values for the power grid lines must comply with the electrical correlation of the grid: (1) Through N-1 verification, establish a line correlation matrix G based on line current carrying rate, line breaking power transfer rate and line coupling rate.
(2) Determine the initial PR value of the line based on the comprehensive evaluation index of electrical betweenness, load deviation rate and voltage shock rate. (3) Introduce virtual nodes in the correlation network to construct the topology matrix G, and iteratively assign PR values through the PageRank algorithm to finally obtain the line PR value.

Correlation Matrix
Construct the correlation matrix G based on the N−1 verification: Among them: g ij (i, j∈N) is the system's line correlation index, indicating the degree of influence of the broken line j on line i, and N is the total number of lines in the system.

Line Current Carrying Rate
In the formula: F ij reflects the power flow transfer generated on line i after line j is broken. M i is the line safety margin and the maximum active transmission capacity of line i. The larger x ij is, the higher the current carrying degree of the line i is caused by breaking the line j, and the more likely the line i is to malfunction.

Line Breaking Power Transfer Rate
In the formula: F ij represents the initial power flow of line i. The larger y ij is, the easier it is to break line j to cause hidden fault of line i.

Line Coupling Rate
In the formula: F ij represents the initial power flow of the broken line j, and z ij reflects the correlation between the affected line i and the broken line j under N−1 verification.
Construct the correlation index g ij of the correlation matrix G based on the above three indexes: The elements of the correlation matrix G reflect the coupling electrical connection between the lines after the system is simulated based on branch breaking, and reflect the criticality of the line. For the line to be evaluated, the higher the criticality and quantity of other lines associated with it The more the number, the more critical the performance of the line, which is also the rationality of introducing the PageRank algorithm into the power grid for key line identification.

Initial PR Value of the Line
Due to the influence of the topology and other factors on the line in the system, it is not the average distribution of the initial PR value of the traditional page. Therefore, in this paper, the initial PR value of the line is determined by the comprehensive evaluation index composed of the electrical betweenness, load deviation rate and voltage shock rate of the conventional network.

Electrical Betweenness
In the formula: S and L are the set of power generation and load nodes; B e (a, b) is the line electrical intermediary; I mn (a, b) is the amount of current change on the line (a, b). W i is the actual output of generator node i, and W j is the actual load of load node j. The electrical betweenness characterizes the occupancy of each transmission line by the power-load node, and characterizes the fragility of the transmission line in the grid topology.

Load Deviation Rate
In the formula: α i represents the sum of the influence of the power flow change caused by the opening of the remaining lines j, and β i represents the sum of the influence of the power flow change caused by the opening of the other lines j. L i reflects the importance of the line in the system through the influence of the breaking between the lines on the system power flow distribution.

Voltage Shock Rate
In the formula: L is the collection of load nodes, U n and U 0 are the actual voltage and initial voltage of load node n after line i is interrupted. E i characterizes the influence of the faulty line on the system node voltage through line breaking. If a line is deliberately attacked and the node voltage deviation rate of the whole network increases significantly, it means that the local reactive power balance of the system is severely damaged. The entropy method is used to normalize the above three indicators and determine the weight of the three indicators. Establish a comprehensive evaluation index, and based on the criticality of the network topology, assign the weight of the initial PR value to each line of the original network. To achieve the goal of assigning more weight to lines with high topology and less weight to lines with low topology. Make the allocation of the initial PR value more reasonable.

Expansion Matrix
The introduction of virtual nodes in the system indicates hidden faults. The virtual nodes are connected to all nodes of the system. The virtual nodes are processed in the method proposed in Ma et al. (2016) and Ma et al. (2017) to establish an expansion matrix, whose elements are defined as follows: (2) i n + 1, 1 ≤ j ≤ n : (3) 1 ≤ i ≤ n + 1, j n + 1?， Among them: the parameter ε represents the correlation factor of the hidden fault's influence on the lines i and j.
By Eqs. 10-12, the established correlation matrix G is extended to an extended matrix G, which is the Google matrix that will be used in the PageRank algorithm model. Applying the power method to G to get: The R (k) represents the PageRank vector in the k th calculation, which is called the criticality vector, which is an (n + 1) dimensional column vector. And Eq. 14 guarantees the convergence of vector R. Similar to the PageRank algorithm, the ranking of node influence is given according to the converged PageRank vector. Based on the above-mentioned topology matrix G and the determined initial PR value allocation, a system key line identification method based on the improved PageRank algorithm is established. The criticality of the line is essentially the final PR value that iteratively converges in the improved PageRank algorithm.

HYBRID GRAY WOLF OPTIMIZATION SUPPORT VECTOR MACHINE ALGORITHM
In this paper, the HGWO-SVM is used to predict the output of PVs. Based on the HGWO-SVM prediction model proposed in literature (Deng et al., 2019), the mutation operator and crossover operator are adaptively improved to avoid fall into the local optimum and enhance the overall convergence performance.

Mutation and Cross-Adaptive Improvement
Differential evolution (DE) regroups according to the differences between individuals in the process of population evolution to obtain a highly competitive intermediate population, and future generations and fathers obtain the next generation population through competition . Differential evolution improves GWO mainly in three steps: mutation, crossover and selection.
(1) Mutation: Choose two different individuals X r2 (t) and X r3 (t), and then combine with the individual X r1 (t) to be mutated after scaling the difference, D(t+1) is the new individual after the combination; Adapting the original mutation operator F 0 to obtain a new mutation operator F can improve the global convergence. The mutation operator adaptive improvement is as follows: (2) Crossover: Compare the crossover operator and the random number to determine the source of the mutated gene; by improving the crossover operator CR to make it smaller first and then larger, the crossover operator is improved as follows: In the formula, the value of CR shows a monotonous increasing trend, so the population diversity of the algorithm is better in the early stage, and the convergence speed is faster in the later stage.
(3) Selection: After mutation and crossover operation, mutant individual U(t+1) competes with individual X(t+1). If the parent individual is better than the new child individual, the parent is retained, otherwise the child is retained To continue to the next mutation.

Hybrid Gray Wolf Optimization Support Vector Machine Prediction Process
The gray wolf algorithm (GWO) optimization is inspired by the social class and predatory behavior of wolves in nature (Mirjalili et al., 2014). The relationship between the social class and behavior of wolves is shown in Figure 1. GWO's prediction idea: As the leader of the wolf pack, the head wolf is responsible for commanding the wolf pack, and makes decisions that can capture prey as soon as possible and avoid falling into danger through the information obtained by the companion. A few elites in the wolves are responsible for hunting. They often hunt in the prey activity area, search according to the characteristic smell of the prey, and always approach the prey toward the most concentrated area. Once the detective wolf finds its prey, the wolves around to get information will participate in the hunting together. By adaptively improving F and CR, the local search capability of HGWO-SVM is improved to avoid falling into local optimum. The HGWO-SVM prediction process is shown in Figure 2 and the specific prediction steps are shown in Figure 3.

SIMULATION PageRank Line Identification
The IEEE 39-node system model is established based on the PSAT. The IEEE 39-node system is shown in Figure 4.
The identification steps are as follows: (1) Establish a correlation matrix G that can visually reflect the electrical connection through the line current carrying rate, line breaking power transfer rate, and line coupling rate, and further derive the expansion matrix G which considers the hidden faults.
(2) Use the entropy weight method to determine the entropy weight of the electrical betweenness, load deviation rate and voltage shock rate. Construct a comprehensive evaluation index to determine the initial PR value of the original network line to replace the average PR value of the traditional PageRank algorithm.
(3) Complete the line identification through the PageRank algorithm, delete the PR value of the n + 1st element, and sort the PR values of the remaining lines to complete the identification. The higher the PR value, the higher the criticality of the line. Table 1 is the entropy weight method to determine the weight distribution of electrical betweenness, load deviation rate and voltage shock rate. Table 2 is the comparison between the circuit identification results obtained by this method and several existing identification methods.
The identification results are shown in Table 2. Comparing several methods in this paper, it can be found that the identification results of the various identification methods are slightly different due to the different identification angles. Through identification, it can be found that the lines in the system that are closer to the power generation node, such as l 25-2, l 2-3 , l 29-28 , l 22-21 , and the long-distance hub lines in the system, such as l 16-19 , l 16-17 are critical lines in the system. Among them, l 25-2 can be judged to be the most critical line. After l 25-2 fails, the system will be disconnected, and the area where the G8 and G10 are located will change from the original dual power supply to single power supply. Which will affect the degree of coupling of the entire network, leading to a wide range of voltage and power flow changes. Literature (Zeng et al., 2018),  and (Zeng et al., 2020) regard l 16-19 as the most critical line, which is the  result of all methods taking voltage change as the main influencing factor. This article uses the voltage offset as one of the basis for determining the initial PR value distribution of the network, so there are differences in the ordering of the same key lines. Through the comparison of the first 10 lines, it can be found that there are four to six lines in the set of key transmission lines identified by the six methods are the same, but the order of the same key lines is different, and the identification angles of these six methods are different.
There are different degrees of difference, and this method ignores the sequence of identification lines and only considers the results of line identification, which is roughly consistent with the above six methods, which confirms the correctness of the method used in this article.

HGWO-SVM Photovoltaic Ultra-short Term Forecast
This paper takes a photovoltaic power plant in Florida (installed capacity of 58 MW) as an example to collect the actual output power data of the PVs from 6:15 on July 1, 2016 to 18:15 on August 1, and the selected collection time interval is about 15 min, there are 1225 groups in total. The first 980 sets of data are used for training, and the last 49 points are predicted 15 minutes in advance. The actual PV output is shown in Figure 5.
In this paper, HGWO-SVM, SVM and PSO-SVM are used for predictive comparison analysis. In order to verify the validity of HGWO-SVM prediction, three types of analysis, mean absolute error (MAE), mean relative error (MAPE) and root mean square error (RMSE) are used. The indicator, the indicator expression is as follows: In the formula: y i is the predicted output of PV, and y i is the actual output of PV.
The prediction comparison, prediction error comparison and prediction output index comparison of HGWO-SVM, PSO-SVM and SVM are shown in Figures 6, 7 and Table 3.
From the analysis of Figures 6, 7 and Table 3, we can see that the HGWO-SVM prediction method is superior to other  methods: HGWO-SVM prediction data is closer to the actual data, and the prediction error is smaller than the original SVM and PSO-SVM. Therefore, the predicted output curve of HGWO-SVM can be divided into approximate time periods to obtain the approximate PVs output power maps of different time periods, as shown in Figure 8. The photovoltaic capacity configuration is based on the standard of not exceeding 20% of the load of the access node. In PSAT, multiple nodes are connected to the photovoltaic power supply, as follows: node 8, node 20 (100 MW), node 3,4,15,16,21,23,24,27,29 (50 MW),nodes 7,25,28 (20 MW). According to the simulation of the approximate PVs output in Figure 7 for simulation analysis, it can be found that the output of the PVs power plant is more than 50% during the sunrise time period, and the output from noon to 3 PM is higher than 50%. 70%, and the output in the morning and evening is smaller, less than 25%. Therefore, according to the approximate determination of PVs in different time periods, the system line identification in different time periods is carried out. The fixed output settings for different periods are: theoretical full output, maximum output at noon (75%), output at noon (65%), output at morning (evening) (35%), output at morning (evening) (10%) and no PV. The comparison of multi-period line identification results is shown in Figure 9 and Table 4. According to the comparison in Figure 9, it can be found that during the period when the PVs output is less than 65%, that is, the light intensity is not high, the PR value of the line changes generally proportionally with the change of the PVs output.
During this period, the key line that was originally ranked at the top still belongs to the key line, only the criticality of these lines fluctuates slightly. For example, the critical order of the lines l 24-23 and l 16-19 which directly connected to the PVs has changed slightly. Among them, the criticality of l 24-23 exceeds l 25-2 . Therefore, the period of low output has little impact on the line identification results, and the key line identification results are similar to the key line identification results of the unconnected photovoltaic; while in the high light intensity period of noon, that is, the PVs output reaches 75% and above during this period, It can be clearly found that the key lines of the system have changed. Analysis of Table 4 shows that during periods of high light   intensity, the criticality of lines such as l 16-17 , l 3-4 and l 14-15 has changed significantly: the criticality of lines directly connected to PVs has changed significantly, such as l 16-17 becomes the most critical line. For example, the criticality of l 2-1 and l 1-39 changes greatly, and it is ranked high in the recognition results, while l 3-4 and l 14-15 change from the original sub-critical line to the key line. Therefore, during periods of high-intensity lighting, PVs output has a greater impact on the line identification results.

CONCLUSION
The identification of key lines of the power grid can play a role in preventing and analyzing cascading failures of the power system in an environment where the proportion of new energy grids is increasing. In this paper, the improved PageRank algorithm is used to identify the key lines of the IEEE 39-node system, and on this basis, the influence of multi-PV grid connection power changes on the line identification is considered. Through calculation example, it can be found that: (1) The improved PageRank identification method comprehensively considers the network characteristics and electrical characteristics of the system, and the identification accuracy is improved compared with the original method; combined with the identification results, it can be found that the lines closer to the power generation nodes in the system and the long-distance hub lines in the system are the key line in the system.
(2) By adaptively improving the mutation operator and crossover operator in HGWO-SVM, the prediction accuracy of HGWO-SVM has been improved to a certain extent.
(3) In the period when the light intensity is not high, the criticality of the line changes generally proportionally with the change of the PVs output, but the original critical degree of the line is still a critical line during this period, so the period of low PVs output is The influence of the line identification result is small; while the period of high light intensity, the criticality of the line changes significantly, and the identification result also changes. Therefore, the period of high PVs output has a greater impact on the line identification result.
This paper simulates the sunrise power change of PVs output by the method of predicting the curve to determine the output in time, and there are still in and out; while the transmission grid as a network with a large area, the PVs output at different positions will not generally change synchronously. Different factors, such as the time difference of light, will lead to the unsynchronized changes of PVs output, which is also the focus of later work.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.