Proving the Correctness of Knowledge Graph Update: A Scenario From Surveillance of Adverse Childhood Experiences

Knowledge graphs are a modern way to store information. However, the knowledge they contain is not static. Instances of various classes may be added or deleted and the semantic relationship between elements might evolve as well. When such changes take place, a knowledge graph might become inconsistent and the knowledge it conveys meaningless. In order to ensure the consistency and coherency of dynamic knowledge graphs, we propose a method to model the transformations that a knowledge graph goes through and to prove that the new transformations do not yield inconsistencies. To do so, we express the knowledge graphs as logically decorated graphs, then we describe the transformations as algorithmic graph transformations and we use a Hoare-like verification process to prove correctness. To demonstrate the proposed method in action, we use examples from Adverse Childhood Experiences (ACEs), which is a public health crisis.


INTRODUCTION
Knowledge graphs have become ubiquitous as a framework to represent entities and the relations that connect them. Thanks to the layer of semantic information, it has become possible to add meaning to the entities and relations contained in graphs and to reason about the knowledge they contain. Because graphs contain knowledge, they are expected to change with new information added or removed depending on outside events. In a similar way, the changes in the semantic layers may cause the meaning associated with each entity to change. Through such modifications, a knowledge graph can become inconsistent with the ontology that describes it (Zhang, 2002), rendering the knowledge, and thus the graph, meaningless.
In this paper, we propose a method to tackle this issue. The proposed method aims to identify what are the transformations entailed by adding or removing a piece of information or an axiom to make sure that the knowledge graph remains consistent with the ontology. In our previous works, we used graph transformation to represent and analyze changes in knowledge-based global health surveillance systems (Brenas et al., 2017Al-Manir et al., 2018). We represent the ontology in C 2 (Gradel et al., 1997) for convenience. The initial modification of the knowledge graphs will be assumed to be through a query language but we will represent the transformations as graph transformations and use Hoare-like verification methods (Hoare, 1969) to produce the proofs. The modifications depend only on the structure of each axiom and the action that is performed and not on the actual ontology or graph. It is thus possible to prove that they behave correctly in an abstract way that is independent of the actual knowledge graph that is modified. In particular, this means that the complexity of the verification task is independent of the size of the actual knowledge graph and only depends on the size of the specification that is proven to be correct.
To showcase how the proposed method works, we use examples coming from Adverse Childhood Experiences (ACEs), which is a major public health concern. ACEs are negative events, e.g., abuse, witnessing violence, etc, that have been linked to various negative health outcomes and risky behaviors (Felitti et al., 1998). The ACEs Ontology and knowledge graph have been described in Brenas et al. (2019a,c).
Here, we will use a clinical example. Patients or their parents are interviewed and screened for ACEs. If they suffer from some ACEs, e.g., if they are homeless or live in a place with mold, they are assigned to a social worker. If they suffered from a different set of ACEs, e.g., emotional abuse, they are assigned to a psychologist. Additionally, if they agree to be part of a study, they need to have provided a phone number or an email address and they will be paired with a caseworker. In order to avoid overworking the staff, social workers can only be assigned 10 patients and psychologists five. If the number of patients is higher than the available number of professionals, they are redirected toward a different clinic and it is agreed that a new hire is required.
In Section 2, we will introduce the formal framework that underpins our method. In Section 3, we show the applicability of this framework through some examples. Finally, in Section 4, we discuss the limitations of the method and further work.

LOGIC, GRAPH REWRITING, AND VERIFICATION
Each knowledge graph is composed of a graph and an ontology that assigns meaning to its nodes and edges. In the following, we present a fairly informal and succinct introduction to our framework. We hope to manage to give readers an idea of the logical foundation of our method without spending too much time and space on information that would not be relevant to most end-users. Readers interested in a more formal and in-depth description can find it in Brenas et al. (2016aBrenas et al. ( , 2018b.
In order to represent the knowledge graphs, we use logically decorated graphs where both nodes and edges are labeled with formulae that are part of the ontology language.
Definition 2.1. Let C (resp. R) be a set of node labels (resp. edge labels), a logically decorated graph is a tuple (N, E, N , E , s, t) where N is a set of nodes, E is a set of edges, N : N → P(C) is a node labeling function, E : E → P(R) is an edge labeling function, s : E → N is a source function and t : E → N is a target function. The source function s and the target function t define the orientation of an edge. For instance, edge e connects node s(e) to node t(e).
The set C (resp. R) naturally depends on the logic. It is constructed such that C contains all "unary" (resp. "binary") predicates of the logic and the closure under their constructors, i.e., class (resp. property) assertions.
To make our reasoning easier to understand, we describe axioms from the ontology both as first-order formulae and in a notation closer to Description Logics. In this paper, we focus on the lower spectrum of the expressivity for the axioms but the verification framework that underpins our method works with the full expressiveness of C 2 (Gradel et al., 1997), the two-variable fragment of first-order logic with counting, that can express most axioms in OWL. 1 Informally, C 2 contains all formulas of first order logic that can be written using only two variables, as well as the counting quantifier ∃ >n and its negation ∃ ≤n . As an example, ∃ >2 x.φ(x) can be translated in first order logic as ∃x 0 , To explain how knowledge graphs are updated and modified, we use graph transformations. The most usual way to deal with graph transformations is by using an algebraic approach rooted in category theory (Barendregt et al., 1987). In this paper, we will use a more algorithmic approach that, we argue, is more suited to verification. In order to build transformations, we first define atomic actions that will be composed to form more elaborate actions.
Definition 2.2. Let C (resp. R) be a set of node (resp. edge) labels. An elementary action, say a, may be of the following forms: • a node addition add N (i) (resp. node deletion del N (i)) where i is a new node (resp. an existing node). It creates the node i. i has no incoming nor outgoing edge and it is not labeled (resp. it deletes i and all its incoming or outgoing edges). • a node label addition add C (i, c) (resp. node label deletion del C (i, c)) where i is a node and c is a label in C. It adds the label c to (resp. removes the label c from) the labeling of node i. • an edge addition add E (e, i, j, r) (resp. edge deletion del E (e, i, j, r)) where e is an edge, i and j are nodes and r is an edge label in R. It adds the edge e with label r between nodes i and j (resp. removes all edges with source i and target j with label r), i.e., s(e) = i, t(e) = j and E (e) = r (resp. ∀e ′ ∈ E.s(e ′ ) = i or t(e ′ ) = j or E (e ′ ) = r).
An action α is a finite sequence of atomic actions.
Actions are not enough to describe all the transformations we want to perform, as they require the knowledge of the exact nodes and edges that are going to be modified. In order to enable the selection of nodes that satisfy a condition, we use rewriting rules and logically decorated graph rewriting systems. Definition 2.3. A rule ρ is a pair (LHS,α) where LHS, called the left-hand side, is a logically-decorated graph and α, called the righthand side, is an action. Rules are usually written LHS → α. A logically decorated graph rewriting system is a set of rules. Example 2.1. Figure 1 contains an example of rule. It selects a node, b, and creates a new node named a (add N (a)) and connects a to b with an edge labeled with lives_with (add E (a, b, lives_with)).
A rule is applied when a match for the left-hand side is found in the knowledge graph that we want to modify. We also define strategies that describe the order in which the rules of a logically decorated graph rewriting system are to be applied.
Definition 2.4. Given a logically decorated graph rewrite system GRS, a strategy is a word of the following language defined by s, where ρ is any rule in GRS: Given two logically decorated graphs G and G ′ and a strategy s, we denote by G ⇒ s G ′ that it is possible to obtain G ′ by applying s to G.
Intuitively, ǫ means "do nothing, " ρ is an application of the rule, s ⊕ s is the application of either of the two strategies (but not both), s; s is the application composition and s * applies the strategy as many times as possible.
Example 2.2. For instance, strategy ρ 0 ; (ρ * 0 ⊕ ρ 1 ) means "apply ρ 0 followed by either as many applications of ρ 0 as possible or one application of ρ 1 ." Previously, we have shown how to use logics to label edges and nodes of graphs. We now go a little further and show how we can use logics to define specifications for the transformations we want to perform, i.e., how to define conditions that we want to be satisfied by the graph after the transformation is performed, given that it may have satisfied another (possibly identical) set of conditions initially.

Definition 2.5 (Specification).
A specification SP is a triple {Pre}(R, s){Post} where Pre (the precondition) and Post (the postcondition) are formulas (of a given logic), R is a graph rewriting system and s is a strategy. Example 2.3. Let us assume that we want a specification describing part of Example . φ 0 ≡ ∀x.Psychologist(x) ⇒ ∃ ≤5 y.(Patient(y) ∧ assigned_to(y, x)) is a formula that states that at most five patients are assigned to each psychologist. Then, SP 0 ≡ {φ 0 }({ρ 0 }, ρ * 0 ){φ 0 } states that if φ 0 is true, it will still be true after applying ρ 0 as many times as possible.
Definition 2.6 (Correctness). A specification SP is said to be correct iff for all graphs G, G ′ such that G ⇒ s G ′ and G is a model of Pre (i.e., Pre is a logical consequence of the labeling of G), then G ′ is a model of Post.
In order to prove the correctness of a specification, we use a Hoare-like approach (Hoare, 1969). The idea is that it is possible to split the transformation into elementary changes that impact the graph in a known and controlled way. In such a situation, given the post-condition that needs to be achieved, it becomes possible to generate the weakest precondition that ensures that the post-condition will be satisfied. This can then be iterated to generate the weakest precondition for the whole transformation. This process is achieved by two functions: the weakestprecondition wp(s, Q) and the verification condition vc(s, Q) for a strategy s and a post-condition Q. More details can be found in Brenas et al. (2016a). The definitions of these functions are given in Figures 2, 3 It is worth noting that the weakest precondition of a closure, s * , is inv s , an invariant for that closure. This invariant is not part of the original specification but needs to be specified. We thus modify the notion of specification.

Definition 2.8 (Annotated Specification). An
annotated specification SP is a triple {Pre}(R, s){Post} where Pre and Post are formulas (of a given logic), R is a graph rewriting system, s is a strategy and every closure in s is annotated with an invariant. Example 2.4. As the strategy in Example 2.3 contains a closure, we annotate it.
Now that the notions of the weakest precondition and the verification condition are defined, we can look back at the original problem we were trying to solve. We define a formula that represents the correctness of a specification. Definition 2.9 (Correctness formula). We call correctness formula of an annotated specification SP = {Pre}(R, s){Post}, the formula: Deciding whether a specification is correct can be translated into deciding the validity of a given formula. This is one of the main reasons why we focused on decidable logics in this section. Another possible choice is to only consider tractable logic so that verification becomes achievable in a reasonable timeframe. FIGURE 2 | Weakest preconditions w.r.t. actions and strategies, where a (resp. α, α ρ ) stands for an elementary action (resp. action, the right-hand side of a rule ρ) and Q is a formula. The decidability of the validity problem for the logic used to label the graph is not, however, the only condition for the decidability of the correctness problem. The definitions of the weakest preconditions introduced substitutions as a new formula constructor. In order for the correctness problem to be decidable, these new constructs must be expressible in the logic, i.e., the logic must be closed under substitutions. In the following, we will be using C 2 .
Theorem 2.2. (Brenas et al., 2016a) For all the logics that are closed under substitution, the proof consists in a set of rewrite rules that conserve the interpretation. For instance, given C 0 an atomic concept, σ a substitution, i and j individuals and π 0 a program: Another possible problem is that the logic needs to be able to express the existence (and absence) of a match. First-order logic can express App(ρ) by using an existential variable for every node of the left-hand side of the rule ρ. This is not possible in the other types of logics we considered as they do not allow to define an unlimited number of variables. There is thus a limitation on what can appear on the left-hand side of the rules.

APPLICATION
When adding or removing knowledge from a knowledge graph, there is a risk to harm its consistency with the ontology that describes its underlying structure. As a result, additional actions may be needed before making any change to a knowledge graph. In this section, we present some of the transformations that can be enacted to make an update. We will then prove that the specifications resulting from the axioms and the transformations under consideration are correct and, thus, the update can take place.
To show some examples, we use the Adverse Childhood Experiences Ontology (ACESO) (Brenas et al., 2019a) as the source for the axioms. The systematic study of adverse childhood experiences and their health outcomes is recent and new data and knowledge are routinely added in this field. As a result, both the knowledge graphs and the associated ontology are likely to change frequently.
Example 3.1. Let us assume that the change that we want to perform is the addition of a new node, pa, which is labeled with PhysicalAbuse. This modification would be equivalent to performing the action a 0 ≡ add N (pa); add C (pa, PhysicalAbuse). ACESO contains the axiom PhysicalAbuse ⊆ Abuse that can be translated in C 2 as ∀x.PhysicalAbuse(x) ⇒ Abuse(x). Performing a 0 would make the knowledge graph inconsistent with the ontology as, if x = pa, PhysicalAbuse(x) ∧ ¬Abuse(x) is true. Instead, the transformation that is actually performed is a 1 ≡ add N (pa); add C (pa, PhysicalAbuse); add C (pa, Abuse). In that case, the correctness formula is (∀x.PhysicalAbuse(x) ⇒ Abuse(x)) ⇒ (∀x.(PhysicalAbuse(x)∨x = pa) ⇒ (Abuse(x)∨x = pa)) which is obviously valid. The knowledge graph is thus still consistent with the ontology.
As it is possible to modify the knowledge graph by adding information contained in new nodes, it is also possible to add new edges. In the previous example, only the new node, pa, was really affected by the modification of the knowledge graph and thus a simple action was enough to update the knowledge graph to preserve consistency. The next example, on the other hand, requires a rule to be applied.
Example 3.2. Let us assume that the change that we want to perform is the creation of a new node, a, and a new edge, e, linking a to the already existing node b with an edge labeled lives_with. This modification would be equivalent to applying Rule ρ 0 shown in Figure 4. ACESO contains the axiom Symmetric(lives_with) that can be translated in C 2 as ∀x, y.lives_with(x, y) ⇒ lives_with(y, x). Applying ρ 0 would make the knowledge graph inconsistent with the ontology as, if x = a and y = b, lives_with(x, y) ∧ ¬lives_with(y, x) is true. Instead, the transformation that is actually performed is the application of rule ρ 1 shown in Figure 4. In that case, the correctness formula is (∀x, y.lives_with(x, y) ⇒ lives_with(y, x)) ⇒ (∀x, y. From the previous two examples, one can observe how the transformations and the axioms interact. As a result, it is possible to define a system of transformation rewriting rules that, given the intended action and the form of an axiom, generates a new transformation of the knowledge graph that can be proved correct independently of the actual knowledge graph and ontology. For instance, the previous examples show that when in presence of an axiom of the form C ⊆ D where C and D are unary predicates (i.e., classes in OWL Lite), an elementary action add C (i, C) should be replaced with the action add N (i, C); add N (i, D) while, in presence of an axiom of the form Symmetric(R) where R is a binary predicate (i.e., a property in OWL Lite), an elementary action add N (i) should be left unchanged. Figure 5 shows these two rules plus additional ones.

Example 3.3. Let us now combine the previous two examples.
We assume that we want to add a new person Alice, represented by the new node a, to the knowledge graph. We additionally want to insert the knowledge that Alice is a girl and that she lives with Bob, represented by the already existing node b. The initial rule that would be applied is rule ρ 2 from Figure 6. Provided that the ontology contains the axioms Girl ⊆ Child and Symmetric(lives_with), all six rules in Figure 5 can be applied. The order in which they are applied does not change the final result, i.e., the rewriting system is convergent, and the final result is Rule ρ 3 in Figure 6. As it is possible to prove that each one of the transformation rewrite rules is correct, their combination is correct as well.
One of the key advantages of using abstract rules to represent the modification of the transformations is that it becomes possible to use the same rule for multiple transformations. In actual cases, the ontology is likely to contain many more axioms and, in particular, to contain a much more developed hierarchy of classes and properties. However, assuming that the ontology contains additional axioms, e.g., Girl ⊆ Female and Child ⊆  Person, the same transformation rewrite rule χ 1 can be applied several times introducing in the action not only add C (a, Child), as shown in Figure 6, but also, add C (a, Female) and add C (a, Person) triggered by the first application of the rule. The fact that the rules are iteratively applied does not create a risk of infinite looping provided we assume that there is a mechanism to check that the rule does not already contain the added elementary actions, given that doing the same elementary action twice yields the same result as doing it only once. Similarly, in an actual execution, the rules that do not modify the transformation, e.g., χ 0 would not be part  of the system. We only show them to make our reasoning clearer and easier to understand.
Hitherto, we have only added knowledge to the graph and not removed any. Removing knowledge is not practically much more difficult, at least as long as we keep working with less expressive axioms.
Example 3.4. Figure 7 contains some transformation rules dealing with node label deletion. χ 6 is similar to the rules in Figure 5 and doesn't actually modify the transformation. On the other hand, rules χ 7 and χ 8 are dependent on the current content of the knowledge graph. Indeed, if C(i) is true, the axiom C ⊆ D is no longer true after the application of del C (i, D). Hence the rule need not only to look at the ontological part of the knowledge graph but also at the graph itself.
Up to now, we have modified the graph by adding and changing the labeling of the knowledge graph but we have not modified the ontology itself. It is worth pointing that the choices that we made when the transformation presented a risk of inconsistency always lead us to modify the graph and not the ontology. There is, however, no reason to think that when there is a conflict between the ontology and the content of the graph, the ontology is always to be considered correct.
Example 3.5. In the first example we presented, we chose when adding the fact that pa was a PhysicalAbuse knowing that PhysicalAbuse ⊆ Abuse to also add the fact that pa is an Abuse. We could also have decided to remove the axiom and the knowledge graph would have been consistent with the modified ontology.
As the last example, we will add a new axiom to the ontology. Until now, we have only used the left-hand side of the rules to look at specific instances, either of axioms, elementary actions or individual nodes and we only, perhaps, added a new elementary action to a given rule. When modifying the ontology, a new rule is added that will modify the graph.
The set of rules R contains the rules shown in Figure 9. ρ 5 looks for a patient suffering from a given type of ACEs, denoted by ACEs 0 , that are not assigned to anyone and for a social worker with nine or fewer patients assigned to them. It then assigns the patient to the social worker. ρ 6 does the same for psychologists. ρ 7 and ρ 8 assign the (possibly) remaining patients to the default options. ρ 9 and ρ 10 look for patients with a phone number or an email address, respectively, and assign them to a case worker. ρ 11 and ρ 12 request the hiring of a social worker or a psychologist, respectively, if it had not been requested before and at least one patient is assigned to the default option.
We chose to apply the strategy s ≡ (ρ 5 ⊕ρ 6 ) * ; (ρ 7 ⊕ρ 8 ) * ; (ρ 9 ⊕ ρ 10 ) n ; ρ * 11 ; ρ * 12 . In order to be able to perform verification, all closures need to be annotated. We annotate all of them with φ, that is all invariants are the same as the pre-and post-conditions. (ρ 9 ⊕ ρ 10 ) n denotes n-iterations of the strategy ρ 9 ⊕ ρ 10 . It is a proper strategy for any given n and the actual choice of n has no impact on the correctness of the specification so we use this construct to let an unspecified number of patients register for the study. ρ 11 and ρ 12 can only be applied at most once each. More elaborate strategy constructors can be defined to require that strategies are applied at most once, for instance, but we did not introduce them in this paper to keep things simpler. The annotated specification is thus s ′ ≡ {φ}(R, (ρ 5 ⊕ ρ 6 ) * {φ}; (ρ 7 ⊕ ρ 8 ) * {φ}; (ρ 9 ⊕ ρ 10 ) n ; ρ * 11 {φ}; ρ * 12 {φ}){φ}. Giving the full proof that the specification is correct would be long and tedious. Instead, let us give an informal description of the FIGURE 10 | A schema of the recommender system (Brenas et al., 2019b). proof. Neither ρ 11 nor ρ 12 affect any of the predicates in φ thus one can ignore them. ρ 9 and ρ 10 only modify assigned_to(x, y) for y a case worker. From φ 2 , one gets that being a case worker excludes the possibility of being either a social worker or a psychologist which means that if φ is true after (ρ 5 ⊕ ρ 6 ) * ; (ρ 7 ⊕ ρ 8 ) * , it will be true at the end of the execution. Similarly, ρ 7 and ρ 8 only modify the assignment to the default social worker and psychologist that, according to φ 2 , are not social workers or psychologists. Thus, one needs to prove that {φ}(R, (ρ 5 ⊕ ρ 6 ) * {φ}){φ} is correct. ρ 5 only assigns a patient to a social worker that has nine or less and thus, after the assignment, they will only have ten or less. The same argument works for psychologists albeit with a different number of edges. The specification s ′ is thus correct.

CONCLUSION
In this paper, we demonstrated the feasibility of using graph transformations for managing changes and modifications in knowledge graphs while maintaining their consistency. We used logically decorated graphs to represent the knowledge graph, graph transformations for the modifications and a Hoare-like approach to verification to prove the correctness of the transformations.
Verification of graph transformations is an active field of computer science. Our approach is based on the method of Brenas et al. (2016b). Several other approaches exists including methods akin to model checking (Rensink et al., 2004) or the use of graph programming languages, such as GP2 (Wulandari and Plump, 2020). These methods have not been applied to knowledge graphs as far as we know. This is only the first step toward a comprehensive solution. An important limitation to the application of our method is the fact that not all currently existing knowledge graphs are equipped with ontologies. In the absence of an ontology, the whole problem is moot as there is nothing to be consistent with. That said, it is possible to start with an empty ontology and to use our method to provably make sure that a knowledge graph becomes consistent with it by defining rules that modify it.
We have presented a few transformation rewriting rules and we are working to formulate a formal framework for their definitions. Similarly, we have only looked at the easiest of the interactions between transformations and axioms.
In particular, the rules that we have presented allow automating the process in the situation that we have presented but it is not hard to find examples where such automation is not feasible. When increasing the expressivity of the axioms, there are many situations where a non-trivial choice has to be made to decide which part of the knowledge is responsible for the inconsistency, and thus should be removed. Algorithms exist to do repairs (Bienvenu et al., 2016) that go much further than we did and integrating these algorithms into our method seems promising. That said, the verification process generates a counter model, which is a knowledge graph that invalidates the specification. This counter model can be used to decide the source of the error.
Moreover, we have only presented logics with very little expressivity in this work to make it easier to follow and to avoid problems that arise at higher expressivity. However, it has been proved that the verification that underpins this method works for the whole expressivity of C 2 (Brenas et al., 2018a). Moreover, OWL allows defining more complex axioms on properties, e.g., transitivity, that cannot be expressed in C 2 but can be expressed in other decidable logics, such as the Guarded Fragment of first-order logic (Andréka et al., 1998) for which the same results apply (Brenas et al., 2018b). Extending the expressivity to full first order logic and beyond is much more difficult as satisfiability in first-order logic is undecidable and, as a result, so is our verification process. On the other hand, it is possible to restrict the specifications to less expressive logics. Finding the conditions on the logic and the change that can be performed to reduce the complexity of the verification process is under active study. The importance of the complexity problem is somewhat reduced by the fact that the complexity of the verification task is dependent on the size of the specification and not on the size of the data it represents.
Among the possible applications of the research presented in this paper is the conception of a recommender system to assist in the detection, prevention and treatment of ACEs (Brenas et al., 2019b). Figure 10 shows a schema of the recommender system's components. The recommender needs to be able to update its knowledge about the circumstances and the health of the patients in a verifiably correct manner.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
JB and AS-N contributed equally to the research and writing of this paper.

FUNDING
The funding for this study has been provided by the University of Tennessee Health Science Center.