A Novel Encoder-Decoder Knowledge Graph Completion Model for Robot Brain

With the rapid development of artificial intelligence, Cybernetics, and other High-tech subject technology, robots have been made and used in increasing fields. And studies on robots have attracted growing research interests from different communities. The knowledge graph can act as the brain of a robot and provide intelligence, to support the interaction between the robot and the human beings. Although the large-scale knowledge graphs contain a large amount of information, they are still incomplete compared with real-world knowledge. Most existing methods for knowledge graph completion focus on entity representation learning. However, the importance of relation representation learning is ignored, as well as the cross-interaction between entities and relations. In this paper, we propose an encoder-decoder model which embeds the interaction between entities and relations, and adds a gate mechanism to control the attention mechanism. Experimental results show that our method achieves better link prediction performance than state-of-the-art embedding models on two benchmark datasets, WN18RR and FB15k-237.


INTRODUCTION
With the development of science and technology, significant progress has been achieved in robotics that the types and application fields of robots are constantly enriched. These robots have played key roles in reducing tedious work. They provide optimal user service and improve the convenience of life. The popularity of various kinds of robots is an inevitable trend.
The emergence of learning intelligent social robots means that robots have truly begun to play roles in people's daily lives, such as pepper and buddy. There are some typical applications, such as greeting conversations, question responses, interest recommendations, and risk management . Huge information is need at the backend of these services. However, the traditional search engine will be affected by the combination of information, resulting in an increase in search volume and a decrease in accuracy. Knowledge with unique meaning and with the goal of solving practical problems can avoid this problem well.
The "Robot Brian" is taken the same as the brain for humans, which stores and infers knowledge to support other behaviors. Knowledge base is usually used to work as the brain of an intelligent robot. Unlike general applications that implicitly encode information in programs, it can explicitly express the corresponding knowledge of actual problems. Providing continuous knowledge support for robots through the knowledge base is equivalent to injecting "thought" into the robots to realize real intelligence true intelligence. Knowledge base construction is a core configuration for intelligent robots. Without a knowledge base, a robot cannot answer any questions. The richer the knowledge base, the more intelligent the robot will have when interacting with users.
A large amount of research work in knowledge representing, web data mining, natural language processing and other fields are dedicated to acquiring large-scale knowledge (Jia et al., 2021), providing rich knowledge bases for building the intelligent brain of robots. In order to facilitate computer processing and understanding, we express the knowledge base in a more formal and concise way, that is, a highly structured knowledge graph composed of triples (e h , r k , e t ). The Knowledge Graphs (KG) not only provides robots with a more human-like representation of the world, but also provides a better way to organize, manage and utilize massive amounts of information.
Although the large-scale knowledge graphs already contain a large amount of entity and relation information, they are still incomplete compared with existing knowledge and newly added knowledge (Zhao et al., 2020). Through knowledge graphs and knowledge self-learning, problems in the knowledge system can be found, and knowledge can be supplemented and enhanced so that the robot's knowledge base can be continuously improved and evolved. There is no end to the optimization and completion of the knowledge base, just as there is no end to human learning, this is research work that needs continuous improvement and development.
In order to alleviate the above problems, researchers have proposed a knowledge graph embedding method, which predicts missing links based on existing facts so as to expand the knowledge base. Its purpose is to learn low-dimensional vector representations of all entities and relationships, so as to simplify operations while the original structured information of the knowledge graph is retained. These knowledge graph embedding methods are widely divided into translation models (Bordes et al., 2013;Ji et al., 2015;Lin et al., 2015), semantic matching model (Nickel et al., 2011(Nickel et al., , 2016Yang et al., 2015;Trouillon et al., 2016), and neural network models (Dettmers et al., 2018;Shang et al., 2019;Vashishth et al., 2020). The related work will be introduced in detail in section 2.
Compared with neural network models, the other types of models are all shallow models, which leads to problems with poor expressiveness. Therefore, more and more complex and deeper models, which have better expressive performance and have achieved competitive success in modeling knowledge graphs, have been proposed in recent years. But these existing models, such as Dettmers et al. (2018), Nguyen et al. (2018), Shang et al. (2019), andVashishth et al. (2020), are more focused on entity representation learning, and the importance of relation representation learning are ignored, let alone the cross-interaction between entities and relations. The interaction between entities and relations plays an important role in knowledge graph representation learning that the entities and relations in the knowledge graph will influence each other and influence the prediction of new triples as they do in the real world.
In this paper, a method of knowledge reasoning and completion based on neural networks on the knowledge graph is designed for robots to simulate the reaction and learning process of human brains. Our model adopts the encoder-decoder model. The encoder model improved the KBGAT model with a gate mechanism to control the attention mechanism and use entity embeddings to update relation embeddings. The decoder model uses Conv-TransE and Conv-TransR to achieve state-of-the-art efforts. This method can enable the robot to quickly search for information, predict answers, and complete knowledge from the knowledge base, to better understand user intent and interact with users more intelligently.

RELATED WORK
In this section, we mainly introduce the work related to our Large-scale Knowledge Graph reasoning and completion methods for robots. As one of the research hotspots, Largescale Knowledge Graph reasoning and completion has attracted extensive attention from academia and industry. Thus, many different types of methods are born, such as the translation model, the bilinear model, the hyperbolic geometry model, the neural network model, the rotate model, and so on. Among these different kinds of methods, the knowledge graph Embedding method is the closest to human expression, which can be regarded as languages for computers and machines like robots. The knowledge graph Embedding method generally includes the following types of models: (i) translation models; (ii) models based on semantic matching; (iii) models based on neural networks; (iv) models with additional information. We will mainly introduce the work related to translation models and neural network based models related to our work in the following.
The translation model represented by TransE (Bordes et al., 2013) uses a simple vector form to represent the entities and relations in the knowledge graph. TransE (Bordes et al., 2013) regards relation as the conversion from the head entity e h to the tail entity e t , and uses e h + r k = e t to determine whether the given triplet is correct. In order to make up for the defect that TransE can only handle the 1-1 relation, TransH (Wang et al., 2014), TransD (Ji et al., 2015), TransR (Lin et al., 2015), and other models have increased the ability to handle multiple relations and semantics and enhanced the knowledge embedding model. It shows that entities and relations can also be embedded in other spaces besides real number space. TransG (Xiao et al., 2015) introduces Gaussian distribution to solve the problem of multi-relational semantics to capture the uncertainty of entities and relations. TorusE (Ebisu and Ichise, 2018) is the first model to embed objects outside the space of real or complex numbers and select a torus (compact Lie group) as the embedding space.
Neural network-based embedding models have received extensive attention in recent years. These methods include embedding models based on convolutional neural networks (CNN) and graph convolution networks (GCN). For example, convE (Dettmers et al., 2018), ConvKB (Nguyen et al., 2018), and InteractE (Vashishth et al., 2020) are both relational prediction models based on convolutional neural networks. ConvE (Dettmers et al., 2018) stacks the embeddings of head entity and relation into a 2-dimensional matrix, and performs convolution operation to extract features with fewer parameters and faster calculations. IntercatE (Vashishth et al., 2020) increases the expressive power of ConvE and expands the interaction between entities and relations through three key ideas-feature permutation, feature reshaping, and circular convolution. ConvKB (Nguyen et al., 2018) represents each triple as a 3-column matrix where each column vector represents a triple element and feeds this matrix to a 1D convolution layer to generalize transitional characteristics in transition-based embedding models.
Graph convolution networks have made great progress in improving the efficiency of node representation in the graph, and it is also applied in the knowledge graph by researchers. Graph convolution network (GCN) (Kipf and Welling, 2017) gathers information for node(entity) from its neighbors with equal importance. Velickovic et al. (2018) introduce a graph attention network (GAT) to learn to assign varying levels of importance to node(entity) in every neighbor. However, these models are unsuitable for KGs, since they ignore that edges (triples) play different roles depending on the relation they are associated with in KGs. SACN (Shang et al., 2019) extends the classic GCN to a weighted graph convolutional network (WGCN) as an encoder, and uses a convolution model Conv-TransE as a decoder to construct an end-toend model. WGCN weighs the different types of relations differently when aggregating multiple single-relation graphs into a multi-relation graph and the weights are adaptively learned during the training of the network. But WGCN inherits GCN's shortcomings in that it treats the same relation type for different entities of the same weight. Nathani et al. (2019) extends classic GAT to KBGAT by incorporating relation and neighboring node features in the attention mechanism and uses KBGAT as encoder and ConvKB (Nguyen et al., 2018) as a decoder.
The above-mentioned models have achieved good performance in knowledge graph embedding for knowledge graph reasoning and completion. However, as far as we know, few works consider the cross-interaction of entities and relations when designing models. Our proposed model uses a variant of the graph attention network (GAT) as the encoder and uses variants of ConvE [Conv-TransE (Shang et al., 2019), Conv-TransR] as decoder, to achieve the simultaneous capture of entity-to-relation and relation-to-entitycross-interaction.

MODEL
This section begins by introducing some notations and definitions used in the rest of this article. This is followed by an introduction of our encoder model GI-KBGAT, an improved Graph Attention Network for KG, which considers gate mechanism on multi-head attention and interaction between entities and relations to generate embeddings. Finally, we describe our decoder network based on Conv-TransE (Conv-TransR). The architecture of our model is as shown in Figure 1.

Notations and Definitions
The knowledge graph is defined as G = (E, R, T ), where E = {e 1 , e 2 , ...e N } and R = {r 1 , r 2 , ...r K } represent the set of entities (nodes) and relations. N is number of entities and K is number of relations. T denotes the triples (edges) of the form t htk = (e h , r k , e t ) ∈ E × R × E, where e h is head entity, e t is tail entity, and r k is the relation between head and tail entity. In particular, entity e h and e t in this paper refer to the head entity and the tail entity, respectively, while other entities with subscripts, such as entity e i are not specified. Table 1 explains the notations that will be used in the rest of this article.

Encoder: GI-KBGAT
As shown in section 2, most existing models ignore the crossinteraction of entities and relations. They only use relations to updating entities, but ignore the effects of entities on relations. We improve the KBGAT (Nathani et al., 2019) by modifying the update process of embeddings of entities and relations to consider the interaction between entities and relations in the update process, and adding a gate mechanism to the attention mechanism for control.
Our model uses the initial embeddings of entities and relations as input, and the following layers use the embeddings obtained from its previous layer as input. The same as Nathani et al.
's GAT model, in order to learn the embeddings of entity e i , we aggregate features of triples associated with it. The triple's embedding is learned by performing a linear transformation over the concatenation of entity and relation vectors corresponding to it, as shown in Equation (1).
Where t htk , e h , e t , and r k represent the embeddings of triple t htk , entity e h , e t , and relation r k , respectively, W t denotes the linear transformation matrix. To measure the importance of each triple t htk for entity e h , the LeakyRelu non-linearity activation function φ is used to get the absolute attention value, the activate vector is defined as b, and the softmax function is used to obtain the relative attention values α htk , as shown in Equation (2).
Then, the new embedding of entity e h is obtained by aggregating the features of the triples associated with e h through weighted by their attention values. As shown in Equation (3), attention values are used to calculate the linear combination of triples(neighbor) features, and the embedding is obtained with a activate function σ .
To stabilize the learning process and encapsulate more information, our encoder also uses a gated multi-head attention mechanism inspired by Vaswani et al. (2017) For the relation update, we propose an update mechanism that uses the same projection operation as the TransR model (Lin et al., 2015). Similar to GAT, TransR model holds that an entity is a complex of various attributes, and different relations focus on different attributes of the entity. TransR uses the projection matrix M r to project the head entity e h and tail entity e t into the corresponding relation space, and defines the score function as f r (e h , e t ) = e h M r + r − e t M r 2 2 . Inspired by TransR, we project the head entity e h and the tail entity e t of a triple into the relation space with a projection matrix W r , and update their relation r k as Equation (5).
In order not to lose the initial embeddings information during training, our model design a gated mechanism to aggregate the initial embeddings and the updated embeddings with learnable gate values. The equation is as shown in Equation (6).
Where W te and W tr are the linearly transform matrices for the initial embeddings of entity E initial and relation R initial , g ei , and g ri are the memory gate for the initial embeddings of entity E initial and relation R initial , g eu , and g ru are the update gate for the updated embeddings of entity E update and relation R update obtained by Equations (4) and (5), respectively. The scoring function for the GI-KBGAT method is defined as follows: Where S and S ′ denotes the set of valid triples [t htk = (e h , r k , e t )] and invalid triples [t ′ htk = (e ′ h , r ′ k , e ′ t )], respectively, · 1 means L1-norm dissimilarity.

Decoder
The convolutional structure is used as the base model of our decoder, which transforms the embedding vector to another space and possesses powerful feature extraction ability and good parameter efficiency. The decoder takes the embeddings of entity and relation trained from the encoder as input. We test both Conv-TransE (Shang et al., 2019), and Conv-TransR, which keeps the translational property of TransR ( e h W + r k ≈ e t W) with 1D convolution inspired by Conv-TransE to be consistent with encoder, as decoder as shown in Figure 1.
The only difference between Conv-TransR and Conv-TransE is that the Conv-TransR model has one more project matrix for entities than Conv-TransE. Following shows the model of Conv-TransR: Conv-TransR uses matrix W to project the entity e h into its corresponding relation r k 's space, the result is e h W. This result is then stacked with its corresponding relation embedding r k to get [ e h W, r k ] as the input of convolutional network. The convolutional network uses different filters(kernels) ω ∈ R 2×F (F ∈ {1, 2, 3...}) to generate different feature maps as Conv-TransE. The scoring function for the Conv-TransR method is defined as below: Where ⊛ represents a 1D convolution operation, [ ] denotes vector concatenation which concatenates features output from convolution with different filters ω, W c is a learnable weight matrix for linear transformation to projected the concatenation embedding into the tail entity e t space, ψ is chosen to be a ReLU non-linear function, then the calculated embedding is matched to tail entity e t by an appropriate distance metric, and the logistic sigmoid function τ is used for scoring finally.

Datasets
Through continuous learning, the data scale of intelligent robots will only increase. Therefore, when evaluating our proposed method, we ignore the small datasets and chose two large datasets WN18RR (Dettmers et al., 2018) and FB15k-237 (Toutanova et al., 2015) as the benchmark datasets. WN18RR and FB15k-237 are improved versions of two common datasets WN18 and FB15k (Bordes et al., 2013) derived from WordNet and freebase, respectively, in which all inverse relations have been deleted to prevent direct inference of test triples by reversing training triples. Table 2 provides statistics of them.

Training Settings
We follow a two-step training procedure that, we first train our GI-KBGAT to encode information about the graph entities and relations and then train decoder model Conv-TransR to perform the link prediction task. For encoder training, we use the margin ranking loss, use Adam to optimize all the parameters with the initial learning rate set at 0.001, set the entity and relation embedding dimension of the last layer to 200, and set the other hyper-parameters for each dataset to be the same as KBGAT (Nathani et al., 2019). For decoder training, we use the standard binary cross-entropy loss with label smoothing, set the size and number of the kernel to 9 and 200, respectively, and set the other hyper-parameters for each dataset to be the same as InteractE (Vashishth et al., 2020).

Evaluation Protocol
Following the previous work, we use the filtered setting (Bordes et al., 2013) that all valid triples are filtered out from the candidate set while evaluating test triples. The performance is reported on the standard evaluation metrics: Mean Reciprocal Rank (MRR) and the proportion of correct entities ranked in the top 1, 3, and 10 (Hits@1, Hits@3, Hits@10). In which all values are presented in percentage. Among these baseline methods, the methods in the first box, namely TransE (Bordes et al., 2013), ConvE (Dettmers et al., 2018), ConvKB (Nguyen et al., 2018), Conv-TransE (Shang et al., 2019), and InteractE (Vashishth et al., 2020), have their results taken from the original paper and can be resumed to acceptable results. We compared our methods with these methods in Table 3 to label the best score in bold. All values are in percentage and the best scores of our model is in bold regardless of the second box (SACN and KBGAT).

Results and Analysis
Frontiers in Neurorobotics | www.frontiersin.org FIGURE 2 | The convergence study of InteractE and our encoder model with InteractE as decoder (represented by "+InteractE") and with Conv-TransE as decoder (represented by "+Conv-TransE") in FB15k-237 using the validation set. Here we only report the results of loss, MRR and Hit@10.  Since our method is inspired by methods SACN (Shang et al., 2019) and KBGAT (Nathani et al., 2019) that we present their results in the second box. The results of the SACN model are obtained from its corresponding paper, but this model requires a large GPU to train and these results can not be reproduced with the authors' code. KBGAT has test data leakage in its original implementation that the results in its paper are not credible. In our experiment results table, we fix the problem and show the correct results of the model.
We first compare our model use Conv-TransE as the decoder with the Conv-TransE model. Our model performs better than Conv-TransE on both datasets. Especially in the FB15K-237 dataset, our model improves upon Conv-TransE's MRR by a margin of 7.6%, Hits@1 of 9.6%, Hits@3 of 5.7%, and Hits@10 of 5.5%. In the WN18RR dataset, our model improves upon Conv-TransE's Hits@10 by a margin of 1.0%. Under the same accuracy, our model achieves the same performance on the other metrics compared with Conv-TransE.
Second, we compare our model use InteractE as the decoder with the InteractE model to better prove the effectiveness of our encoder. As shown in the Table 3, compared with the original model, most metrics of InteractE have been improved after our encoder model is added. For example, our model with InteractE improves upon InteractE's MRR by a margin of 0.3% and Hits@10 of 1.1% in the FB15K-237 dataset.
Third, we compare our model with the other baseline models. In the FB15K-237 dataset, our model with Conv-TransE as decoder achieves the best performance in Hit@3 and Hit@10, and tied for the best in Hit@1. In the WN18RR dataset, our model with Conv-TransR as decoder achieves the best performance in all metrics. Meanwhile, these two models both can achieve the top three effects on the other datasets. In conclusion, our model can achieve the best results on both datasets FB15K-237 and WN18RR. Figure 2 shows the convergence of the three models: InteractE, our encoder model with InteractE as the decoder, and Conv-TransE as the decoder. Because Conv-TransE uses a different loss function, we do not put its loss result for comparison. We can see that our models (the red line and green line) are always better than InteractE (the blue line) under MRR and Hit@10. And our models converge faster than InteractE.

Ablation Experiment
In order to prove the validity of our model, we do some ablation experiments on WN18RR dataset to show the influences of different parts of our model. The results of the ablation experiments are shown in Table 4. Compared with the KBGAT model, our improved encoder with the KBGAT decoder (convKB) can achieve better performance as shown in the first box. After changing the decoder methods, our method performs more superior as in the second box. Combined with Table 3, the decoders we use are both better than the ConvKB which is used by KBGAT, which shows the effectiveness of our decoder chosen.
To better show the influences of our innovation in KBGAT, we also test the influence of different parts in our encoder. We separately remove the gate mechanism, the interaction mechanism, and both of them to see the impact on results. The results are shown in the second box of Table 4. The gate mechanism and the interaction mechanism both perform a similar influence on the encoder model. And the best result can be achieved by combining them in the KBGAT model as the encoder model.
For these results, we conclude that our encoder-decoder model can better the expressive performance of entity embeddings and relation embeddings, and can achieve competitive success in modeling knowledge graphs. Since the hyper-parameters of our model for each dataset are set to the same as the existing methods and no parameter tuning is performed to obtain the best performance, we believe that the performance of our model can still be improved with parameter tuning.

CONCLUSION
In this paper, we propose a novel approach for knowledge graph relation prediction, which can be used in intent understanding in human-robot interaction and in robots' knowledge graph completion. Our methods can work well in large-scale knowledge graphs and can be extended to learn embeddings for various applications of robots, such as dialog generation and question answering.
In the future, we intend to extend our method to work as an end-to-end work and consider the attribute information and temporal information into our model to improve the ability to handle complex knowledge graphs. And we also intend to test our work in a real robot's "brain" to test the ability of our model in actual work.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found at: two datasets used: WN18RR and FB15k-237. Download from https://github.com/TimDettmers/ConvE.

AUTHOR CONTRIBUTIONS
YS, KC, and CL contributed to the conception and design of the study. KC wrote the sections of the code. CL wrote the parts of the first draft of the manuscript. YS organized the experiments and the manuscript. AL and HT directed the whole work and revised the manuscript. All authors approved the submitted version.