Power System Network Topology Identification Based on Knowledge Graph and Graph Neural Network

The automatic identification of the topology of power networks is important for the data-driven and situation-aware operation of power grids. Traditional methods of topology identification lack a data-tolerant mechanism, and the accuracy of their performance in terms of identification is thus affected by the quality of data. Topology identification is related to the link prediction problem. The graph neural network can be used to predict the state of unlabeled nodes (lines) through training on features of labeled nodes (lines) with fault tolerance. Inspired by the characteristics of the graph neural network, we applied it to topology identification in this study. We propose a method to identify the topology of a power network based on a knowledge graph and the graph neural network. Traditional knowledge graphs can quickly mine possible connections between entities and generate graph structure data, but in the case of errors or informational conflicts in the data, they cannot accurately determine whether the relationships between the entities really exist. The graph neural network can use data mining to determine whether a connection obtained between entities is based on their eigenvalues, and has a fault tolerance mechanism to adapt to errors and informational conflicts in the graph data, but needs the graph data as database. The combination of the knowledge graph and the graph neural network can compensate for the deficiency of the single knowledge graph method. We tested the proposed method by using the IEEE 118-bus system and a provincial network system. The results showed that our approach is feasible and highly fault tolerant. It can accurately identify network topology even in the presence of conflicting and missing measurement-related information.


INTRODUCTION
With access to new energy sources continuously increasing and the scale of power grids growing, the variability and complexity of the operation of power grids has increased drastically (Li et al., 2018). In recent years, several blackouts in areas across the world have caused substantial economic losses and led to adverse social impacts. The prevalent online security defense systems for power grids, which focus on modeling, simulation, and fault prediction, have been severely challenged. With the rapid development of power grid measurement systems and the increasing maturity of big data technology, the state cognition and control of grid operations based on operational information on power grids has emerged as the new model for their secure operation . In addition, the increasing uncertainties surrounding the use of renewable power has provided unprecedented challenges for ensuring the secure and economic operation of the current power systems .
Understanding the topological structure of a power grid and laws of changes within it in a timely manner based on measurements is the basis for realizing the data-driven operation and control of power grids. Research on identifying the topology of a power network can be divided into several classes. Several studies have used the incidence matrix and the adjacency matrix constructed by using the switching state of the system to determine its connectivity and track topological changes (Zhu et al., 2011;Ma et al., 2014;Lourenco et al., 2015). Such methods are less tolerant of faults and conflicting telemetry data, and their effect depends entirely on the quality of remote signaling data. Other attempts have been made to identify the network topology based on graph theory and object-oriented technology (Song et al., 2005;Nian et al., 2008;Wang et al., 2009;Li et al., 2011;Yansheng et al., 2017). Such approaches involve using technology for the matrix-based identification of topology (Wang et al., 2019). They can compensate for the lower computational efficiency of the original process in terms of timeliness by optimizing the method, but are still reliant on remote signaling data for topology identification and have poor fault tolerance. Recently, the topology identification problem has been transformed into an optimal combination of measurement data problem. . It is a new attempt to identify network topology based entirely on data. The feasibility of this approach has been verified. Still, this method is not fault tolerant, and requires complete information concerning the active power. The determined optimal combination of data is directly affected when data are missing or when significant errors occur. Furthermore, this algorithm requires a large amount of redundant information as a supporting condition. As Power Management Units are not fully operational, owing to the massive investment required, the requisite supplementary data are difficult to obtain. In summary, current methods of topology identification depend on the quality of data. We thus need a fault-tolerant method of topology identification.
The use of knowledge graph technology has emerged in recent years in several fields, and has attracted widespread attention from both industry and academia (Liu and Wang, 2018). The knowledge graph is a mesh knowledge base of entities with attributes linked through relationships. Its value lies in organizing related information at minimal cost and generating useful knowledge. The topology of a power network is graph data reflecting node-to-node relationships. So the knowledge graph is an excellent vehicle for describing topological information (Park et al., 2019). However, information conflicts may occur in the knowledge graph due to errors or missing information. We thus need a data mining method to identify relationships that cannot be confirmed owing to the informational conflicts.
The graph neural network (GNN) is applicable mainly to non-Euclidean spatial data (Park et al., 2019). It has been applied to a variety of fields (Wu et al., 2018;Jing et al., 2020;Li et al., 2020;Xie et al., 2020;Yu et al., 2020). Multiple types of graph neural networks have been derived for different problems Nikolentzos et al., 2020;Su et al., 2020;Wu et al., 2020). Because the knowledge graph is a kind of graph data, a combination of the GNN and the knowledge graph can be used to solve all types of knowledge graph-related problems. For example, some researchers have modeled the knowledge graph with the GNN (Nathani et al., 2019). The GNN-based approach can adequately capture information on complex and hidden patterns in ternary neighborhoods. It can also achieve the relational complementation of the knowledge graph, in contrast with knowledge-based reasoning that uses individual ternary relationships. A GNN-based entity alignment scheme has been proposed (Cao et al., 2019), and it has been experimentally shown to yield highly consistent data that improve the quality of the knowledge graph on multiple datasets.
In the context of power network topology, a knowledge graph can indicate relationships between entities in the network. The graph neural network can serve as a technical tool for analyzing graph data, mining, and reasoning for relationships among entities in the knowledge graph. It also allows us to consider missing information and determine the correctness of the data, significantly improving the quality of the knowledge graph. Thus, the characteristics of the knowledge graph and the graph neural network are fully compatible with the requirements of topological recognition.
This paper proposes a method of topological identification based on the knowledge graph and the graph neural network. First, we setup an entity-relationship-entity and entity-attribute/attribute value information triad based on remote signaling data, telemetry data, and a database of component information to construct an informational knowledge base. Second, based on the graph neural network, we determine the relationships that cannot be established owing to conflicting information. Finally, based on knowledge inference, we generate the topology of the power network and track changes in it. The results of tests show that the proposed method is fault tolerant and can accurately determine the network topology of a simulated network as well as a real power grid.

Definition of Knowledge Graph
A knowledge graph is a structured semantic knowledge base used to describe concepts and their interrelationships in the physical world in symbolic form. The basic unit is the "entity-relation-entity" triad, and the entity and its associated attribute-value pairs are interconnected through relationships to form a networked knowledge structure.
Based on the above definitions, we can draw the following conclusion: (1) Essentially, the knowledge graph is a semantic network that reveals the relationships between entities.
(2) The research value of the knowledge graph lies in the fact that it is a reticular knowledge base built based on data and information. It organizes the information into usable knowledge at minimum cost. (3) The applicative value of the knowledge graph is that it can change the existing method of information retrieval. On the one hand, retrieval is achieved by reasoning. On the other hand, it can graphically display the structured knowledge after classification.

Architecture of the Knowledge Graph
The architecture of the knowledge graph consists of a data layer and a schema layer, as shown in Figure 1. The data layer is toward the logical architecture of the knowledge graph and the schema layer toward its technical architecture. The primary function of the data layer is the extraction of data and information to generate relationships inherent within the latter. The schema layer builds on top of the data layer and is an essential part of processing and analyzing data to generate knowledge, including such means as information fusion and processing, and creating knowledge based on inference.

METHODS OF CONSTRUCTING POWER GRID TOPOLOGY-ORIENTED KNOWLEDGE GRAPHS
The power information system records a large number of structured or semi-structured data. These data are descriptions of the measured attributes and results, and the determined entity attributes. All these data provide the necessary information units to construct knowledge graphs of the topology of the power network.

Methods of Entity-Property Construction
A large number of fields in databases recorded by the power management information system are deconstructed. The measured set of items of entity information O is thus obtained. It contains all information on nodal entities and line entities, given as: where N i and N j are node information units and L ij is a line information unit. The information provided by each entity corresponds to its entity attribute and attribute value. For example, an entity node (N i ) section corresponds to the description node inject telemetry (K ni ) and node inject telemetry (P ni ). Remember that an entity node (N i ) has the following attributes: The resulting set of properties corresponding to the entity is as follows: Entity O and attribute V correspond in a one-to-one manner. The expression for entity-attribute Q is as follows:

Methods of Constructing Entity Relationships
Relationships are key to the knowledge graph because they require its use to solve a number of problems. In general, they make it possible to further parse the entity correspondences in the entity information set O, and to organize the entity-relationship information. The expression for the relationship is given as: where N i represents node i, L ij represents the line ij directly connected to node i, and R ni,lij is the value of the relationship between them. 1 indicates that they are directly connected and 0 indicates that they are not.

Methods for Constructing Knowledge Graph Triangles
In this paper, entity-relation-entity and entity-property-attribute (value) are transformed into an integrated entity (attribute-attribute value)-relation-entity (attribute-attribute value) triad to form a knowledge base for generating topologies. The expression is given as: where G denotes the set of ternary groups, O the set of entities, and each object is unique. V denotes the set of attributes, including the number of nodes and states of line switches, the magnitude of node-injected power, the first-end power of the line, and the line first-end power difference. R denotes the set of relationships of direct correlations between entities.

GRAPHICAL NEURAL NETWORK
Traditional neural networks have had great success in extracting features from spatial Euclidean data, but are deficient in processing non-Euclidean spatial data. In recent years, researchers have learned from the ideas of convolutional networks, cyclic networks, and deep autoencoders. They have used this knowledge to design a neural network structure, called the graph neural network, for processing graph data.
A graph is composed of many vertices and edges, usually represented and stored using an adjacency matrix. The spatial features in graph data have the following characteristics: (1) Node features: Each node has certain features.
(2) Structural features: Each node in the graph data has certain structural features.
Thus, there is a connection between pairs of nodes. In general, graph data should consider both node and structural information, and a graph neural network should be able to automatically learn both the characteristics of a node and node-to-node correlationrelated information.
GCNs are the first graph-based networks to apply the simple convolution operation to image processing for graph data processing. This paper explains the principles of graph convolutional neural networks mainly from the perspective of spatial construction.
A grid can be substituted with a graph g (Ω, A), where Ω is defined as a discrete set of nodes Ω ∈ R N×1 , and W denotes the set of edge weights W ∈ R N×N . A straightforward way to represent edge weights on W is to set a threshold δ > 0. Nodes adjacent to Node j can be represented as: As shown in Figure 2, we want to convolve node 6, and adjacent to it are nodes 1 and 5 (including node 6 itself); that is N[6]. Therefore, a convolution of node 6 can be expressed as W 1,6 x 1 + W 5,6 x 5 + W 6,6 x 6 , and x denotes the features of every node, and w denotes the weight of the convolution.
o v , as the convolution of the node v, can be expressed as: The input node feature may be a vector with dimensions D. A single convolution operation may contain multiple convolution kernels with dimensions d. We convolve every dimension of the input feature and sum it to obtain the convolution of node v. The formula is as follows: The basic idea of the GCN is to reduce the dimensionality of a node's high-latitude adjacency information in the graph to a lowdimensional vector representation by aggregating feature-related information from adjacent nodes. It can aggregate the global information of the graph to represent features of the nodes. The process of updating the feature-related information of the node is shown in Figure 3.
The convolution operator can be defined as (Kipf and Welling, 2017): where h v is the linearly transformed feature value of node ] after aggregating features of neighboring nodes, u is the set of nodes adjacent to ], n (v) is the number of nodes neighboring ], W is the weight, and b is the bias. The formula for passing features through multiple hidden layers is given as: The matrix form of the above equation is as follows: The adjacency matrix A does not contain information on the nodes themselves. To solve this problem, givenÃ A + I, I is a unit array. Define the degree matrix D, D ∈ R N×N , that is a diagonal matrix characterizing the degree of connectivity of a node to other nodes D ii A ij . Thus, the normalized D − 1/2ÃD− 1/2 is used as a medium to pass information in the topological space to solve the diagonal problem so that the training covers information on the nodes. H (k) ∈ R N×F is the feature matrix after k-layer activation, H (0) X, and W (k) is the learnable parameter matrix of the kth layer.
In short, compared with the basic structure of a neural network, the MLP, the feature matrix multiplied by the weight matrix H σ(XW), adds an adjacency matrix to the graph neural network H σ(AXW).
The core framework of a graph neural network consists of two matrices: 1) the adjacency matrix A, A ∈ R N×N , where N is the number of nodes, and 2) the feature matrix X, X ∈ R N×F , where N represents the number of nodes in the graph. F is the number of dimensions of the feature of each node. Moreover, the output matrix is Z, Z ∈ R N×C , and C represents the number of convolutional kernels used to implement the classification. The graph neural network is designed to maximally match the input and the output, and continually train itself to classify the input data. Figure 4 shows the flow of the graph neural network.

PROPOSED METHOD FOR IDENTIFYING POWER NETWORK TOPOLOGY Constructing Overall Topology of Power Grid
Through information extraction, elements of knowledge such as entities, relations, and properties can be extracted from the original corpus to obtain a series of basic factual expressions.
The relationship between entities can be extended according to knowledge inference. For example, the node Na in the entity information set O directly connects to line L ab , and node N b in the set is also directly connected to L ab . The relationship can be extended to N a -L ab -N b . Multiple entity-relation unit elements can form a topology containing all entity-relation information.

Identifying Power Network Topology
The above is not identical to knowledge. To obtain a structured and networked knowledge system, it is necessary to carry out knowledge processing and reasoning. Knowledge processing and knowledge reasoning are the essential means and critical parts of the construction of a knowledge graph. In practice, remote signaling data may be mixed with incorrect information owing to the possible existence of the blade gate relation. Telemetry data may also be combined with errors, such that the resulting data do not comply with the operating rules of the power system. Ambiguity may thus persist within the entity-relation-attribute triad, which is required for the most critical step in topological identification based on the knowledge graph. The authenticity of the entity-relation units is thus verified through the data mining method of graph neural networks.
The main problem in determining power network topology is to identify the connectivity of the lines based on their properties (values). In applying the graph neural network, we use the relationship between entities in the knowledge graph (lines) as "vertices" and the entities (nodes) as "edges." In this way, the connectivity of lines is directly generated by line training, which helps to avoid the conventional approach of first determining the nodes and then identifying connectivity. The follow chart of the proposed method is shown in Figure 5.
The specific steps are as follows: Step 1. Extract the attributes and attribute values of edges in the knowledge graph that contain the line input power P ij and the line output power P ji , and the head switch statuses of the line S ij and S ji to form a description of the characteristic matrix of the line:. P ij or P ji is the measured power, and S ij and S ji are the descriptions of the on and off states of the line: on is 1 and off is 0.
Step 2. P ji and P ij in x ij are normalized to form x ij to ensure that the eigenvalues x ij are in the interval [-1,1], such that all x ij are aggregated into X, X ∈ R N×4 , and N is the total number of lines.
Step 3. Define z ij as the line category identifier, and make all aggregates into Z, which contains all line categories for the graph neural network training comparison.
Step 4. Extract all entity relations of the knowledge graph G (O, V, R), that is, the node-line relationship, to form an adjacency matrix A, with a line as the vertex and a node as the edge, to describe the direct connection between lines. A ij represents whether there is a common node between lines i and j.
Step 5. Set IT max to the maximum number of iterations. Substitute the adjacency matrix A and the feature matrix X into the feature transfer formula, calculate Z ' . Z ' − Z as a loss function, and modify the weight matrix continuously. After iterative computation, use the weight matrix W of each hidden layer, on behalf of the graph neural network after selftraining rules of decision, directly to determine line connectivity.
Step 6. After connectivity determination, update the adjacency matrix A ' . Finally, generate the topology by deducing the correlation between the internal line and the line according to depth search to analyze the generated topology and determine whether isolated nodes and islands are in operation.

Experimental Results on a Test Sample
The IEEE 118-node system containing 186 lines was used for our experiment. The initial active power of the nodes was obtained by the MATPOWER software simulation. We set-up two disconnected lines: 8-30 and 48-49. The lines 17-18, 21-22, 34-36, 76-77, 84-85, 92-100, 105-107, and 114-115 contained conflicting information. The remaining lines were normal. Details of the information in the lines are shown in Table 1. Algorithm 1 shows the overall training and testing processes. The above sample was selected to demonstrate the effectiveness of the proposed method. We followed the flow of Algorithm 1 in calculating the sample data, and then used the result of the judgment of this sample with the original features of the line. The probability of categorization and the number of line characteristics that determine the presence of conflicting information and "off" lines are shown in Figure 6 and Table 2.
It is clear that only lines 8-30 and 48-49 had a high probability of falling into category ⅲ in the above diagram. According to the results of training of the graph neural network, these two lines were judged to be disconnected, consistently with the original perturbation.
According to the probability distribution, lines 27-115, 114-115, 68-116, 12, 117, 54-55, 80-81, 75-118, and 76-118 were all judged to be connected but had conflicting information. The results were consistent with the original settings. We updated the knowledge graph based on the above results of identification. A comparison of the results of topological identification is shown in Figure 7.

Experimental Results on the IEEE 118-Node System
We used an IEEE 118-node system containing 186 lines. The initial active power of the nodes was obtained by MATPOWER software simulation. A 1% random error obeying a Gaussian distribution for active node power was mixed in to generate 1,000 training samples. For each sample, two lines were randomly disconnected, and eight lines were made to randomly miss some attribute values. We formed an information base for the topology of the power grid according to the construction of the knowledge graph, and organized the data and relationships in the node, line feature matrix, and adjacency matrix. The feature matrices of 900 training samples were labeled as follows: lines in label category ⅰ represented connected lines, those in label category ⅱ represented lines with conflicting information in the feature matrix despite being related lines, and lines in label category ⅲ represented disconnected lines. One hundred samples were retained as unlabeled samples. Seven hundred labeled samples were used as input to the graph neural network for training and another 200 were used to compare the ingested topology with the generated topology through the rules learned from the graph neural network.
For each sample, the results of identification of line connectivity could be organized into a 186 × 1 column vector, and the number of dimensions of the matrix of results for 200 samples were 186 × 200. By marking different categories with different colors, a map of the results of connectivity was obtained for all samples. We statistically determined the determinations of the remaining 199 samples. The results are shown in Figure 8.
In the figure above, the abscissa represents the number of test samples and the ordinate the line number. The blue blocks represent lines judged to belong to category ⅱ and the red blocks represent those judged to be in category ⅲ. Each sample contained two category ⅲ lines and eight category ⅱ lines. Compared with the initial topology, the graph neural network method accurately identified the disconnected lines.
By using the rules learned from the graph neural networks, we identified the topologies of the remaining 100 unlabeled samples. The results are shown in Figure 9. The number in bold is the value with the largest probability distribution for each line. It indicates that the lines are grouped into the corresponding categories. As shown in the figure above, we were able to identify the operating status of each line. Each unlabeled sample contained two Category ⅲ lines and eight Category ⅱ lines, which is consistent with the original assumption.
For the samples remaining as labels, the right decision is made to compare the decision results with their original feature quantities, and the results are consistent.
To further verify the proposed method, we selected the above 100 unlabeled samples as test samples. The method used for comparison with the proposed one was a topology identification method based only on knowledge graphs. Both methods are applied to the same data sample.
Each sample contained 184 connected lines and two disconnected lines. Eight of the connected lines were partially missing information. As is shown in Figure 8, the proposed method accurately identified the line connections and determined the classification of each line. We thus determined that there were 18,400 connected lines and 200 disconnected lines in the 100 test samples; 800 of the connected lines had missing partial information, which was consistent with the original settings. The comparison of the results with those of the method for topology identification based only on knowledge mapping is shown in Figure 10.
The method for topology identification based only on knowledge mapping accurately identified disconnected lines, but its accuracy of identifying connected lines was 95.65%. This is mainly because knowledge-based topology identification was less fault tolerant than the proposed method and struggled to deal with missing information.  In summary, the proposed method for topology identification based on a combination of the knowledge graph and the graph neural network can determine the state of line connectivity, identify topological changes, and has strong fault tolerance. Even in the case of conflicting information, it can identify the topology of the power network, and is thus superior to the traditional method of recognition based only on the knowledge graph.

Experimental Results on a Provincial Network
The proposed method was applied to a provincial network consisting of 132 nodes and 181 lines. The data were recorded at intervals of 1 min. We combined the knowledge graphs and the graph neural networks to identify the topology of the power network.
Historical operational data provided a sufficient number of samples to train the graph neural network, thus enabling it to make more accurate decisions. Nevertheless, with increasing numbers of training samples, the number of dimensions of the   adjacency matrix A increased from the original N × N to the originalÑ ×Ñ,Ñ Np(t + 1). Considering that the training time increased with the amount of input data, we trained the samples with a window length of 10 min and a step length of 10 min, used the number of line features in period t as input, and determined the network topology in period t+1 based on the results of training.
Based on one-month data from the power grid, we found that the provincial network comprised two orphaned network topologies and several isolated nodes. The details are shown in Figure 11.
As shown in Figure 6, the full-time network consisted of 13 remote nodes and two isolated networks, with a total of 119 nodes in continuous operation and 153-155 lines involved. The structural information of the network is shown in Table 3.
We verified that 13 of the isolated nodes were planned nodes that had not yet been put into use in the given month, which is consistent with the results of topology identification. Isolated network 1 contained 75 nodes with 94-95 lines in operation, of which lines 28-46, 42-52, and 65-67 had pitching and cutting variations. The remaining lines in isolated network 1 were in stable operation. Isolated network 2 contained 44 nodes and 59-60 lines in operation, of which lines 91-96 and 118-125 had pitching and cutting variations. The number of lines in operation in chronological order in two isolated networks are shown in Figure 12.
The results show that the proposed method is feasible. It can accurately identify the state of line switching, generate network topology, update the knowledge graph, and improve its quality. It also provides a good basis for analyzing the islanding operation, demodulation control, and the operating state and control of power grids.

CONCLUSION
The main contribution of this paper is the proposal of a method based on a knowledge graph and a graph neural network. This method transforms the traditional problem of topology identification into one of inferring the state of line connections based on graphical data for topology identification.
The proposed method is distinct from methods that used knowledge graphs only for topology identification because it contains an additional process for inferring conflicting information. After obtaining the overall topology covering all entities in the network, we performed knowledge inference on contradictory information based on graph neural networks. We then determined the line connectivity and updated the entity-relation information in the graph. In this way, we compensated for the deficiency of topological identification based only on the knowledge graph while making substantial gains in identifying topology and tracking changes in it.
We experimentally demonstrated that our proposed method is fault tolerant, unlike traditional methods, and correctly identifies line connectivity even in the case of informational conflicts.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
CW: Data curation, Writing-Original draft preparation and Writing-Reviewing JA: Conceptualization and Methodology GM: Supervision and Editing.

FUNDING
This work was supported by the National Natural Science Foundation of China (No. 51877034).