HetInf: Social Influence Prediction With Heterogeneous Graph Neural Network

With the continuous enrichment of social network applications, such as TikTok, Weibo, Twitter, and others, social media have become an indispensable part of our lives. Web users can participate in their favorite events or pay attention to people they like. The “heterogeneous” influence between events and users can be effectively modeled, and users’ potential future behaviors can be predicted, so as to facilitate applications such as recommendations and online advertising. For example, a user’s favorite live streaming host (user) recommends certain products (event), can we predict whether the user will buy these products in the future? The majority of studies are based on a homogeneous graph neural network to model the influence between users. However, these studies ignore the impact of events on users in reality. For instance, when users purchase commodities through live streaming channels, in addition to the factors of the host, the commodity is also a key factor that influences the behavior of users. This study designs an influence prediction model based on a heterogeneous neural network HetInf. Specifically, we first constructed the heterogeneous social influence network according to the relationship between event nodes and user nodes, then sampled the user heterogeneous subgraph for each user, extracted the relevant node features, and finally predicted the probability of user behavior through the heterogeneous neural network model. We conducted comprehensive experiments on two large social network datasets. Furthermore, the experimental results show that HetInf is significantly superior to the previous homogeneous neural network methods.


INTRODUCTION
Nowadays, social networks are everywhere around us in our daily lives. Social influence occurs when we get information from social networks, which means that network events (such as network news, trending topics, publishing papers, or other events) or network users we are interested in constantly influence us through social media, and both of them can induce us to engage in social action (including retweet, comment, like, publish, and purchase). For example, live commerce is very popular nowadays, and we will choose our favorite live streaming host to buy necessary commodities. From another perspective, both the live streaming host (user) and the commodities (event) have a substantial impact on the target user's behavior. Similar to the definition of "event" in the study mentioned in reference 1, social events can be regarded as a complete semantic unit in which network users participate and understand. One of the key computational problems is to predict the user's social behavior in social influence analysis. How to model the influence relationship to predict the behavior of network users on events is one of the key computational problems in user-level social influence prediction. This problem is applied to many fields, including but not limited to election [2], network marketing [3,4], recommendation [5], rumor detection [6], and information dissemination [7,8].
There are a large number of research studies on the role of the heterogeneity of nodes in social networks in social influence [9][10][11][12]. This kind of study mainly focuses on user nodes' interest in event nodes and predicts user behavior by capturing the influence of event nodes' topic level. The study in reference 10 improves the traditional cascade propagation mode and applies the topic distribution methods to an independent cascade model and linear threshold model. The study in reference 9 uses a graph generation method to predict user behavior through the relationship between event topics and network users. These methods use traditional machine learning models to predict users' social behavior through the manual feature representation of learning nodes. However, they do not consider the association between different types of network nodes in heterogeneous social networks, such as the dual impact of users and events on target users, which leads to the limited ability to capture the incentives that really affect user behavior.
Due to the progress of the graph neural network [13], the nodes of network have stronger representation ability. Many studies use graph neural network to model the problem of social influence prediction and make plenty of progress. The study in reference 14 uses the user's local network as the input of the graph neural network to learn the user's potential social representation and uses both network structures and user-specific features in convolutional neural and attention networks. Based on DeepInf, the study mentioned in reference 15 applies the multiview impact prediction network to solve the social influence prediction problem. However, these methods are based on assumptions that users are only affected by other users, in order to model the relationship between users (homogeneous network), which lacks the analysis of the influence between heterogeneous nodes. Real social networks (such as Twitter, Digg, and Citation network) are heterogeneous and contain different types of entities [16], for example, user nodes and event nodes (stories, tweets, papers, and other objects), which inevitably interact with each other. For example, in Figure 1A, Bob may forward the concert event because he is affected by the user Jerry (because he is not interested in music), and Tom may forward the concert event because he is affected by the event (he likes music).
To tackle these challenges, we focused on user-level behavior prediction in a heterogeneous network. This network contains two types of nodes: user and event. It aims to construct a heterogeneous influence network of event nodes and user nodes based on attributes, and we use graph neural networks to model the influence between these nodes so as to better mine the inducement of user behavior. Inspired by the latest research on heterogeneous neural network [13], the local modeling of FIGURE 1 | Problem illustration of mining user-event influence in heterogeneous networks and predicting user behaviors. Figure 1A shows an instantiated prediction case, and the goal is to predict whether Smith (ego-user) will forward the concert blog (whether the red line will occur in the future). Figure 1B is obtained by abstracting Figure 1A, given 1) the relationship between the user and the event within the observable time (the connection in b includes three relationships: u-u, u-e, and e-e); 2) the embedding representation of the attributes of different nodes in the observable time (the rectangle next to the node in b); 3) the activation state of neighbor nodes (active or inactive); and 4) the embedding representation of each node in the network, then we predicted whether ego-user will participate in the target event.
Frontiers in Physics | www.frontiersin.org January 2022 | Volume 9 | Article 787185 heterogeneous networks can capture both structure and content heterogeneity and provide more reliable heterogeneous node representation ability for downstream tasks. Therefore, we combine the benefits of heterogeneous neural networks and semantic representation methods to model the influence network of local neighbor nodes based on heterogeneous network graph. For example, in Figure 1A, with the aim of learning the influence of different types of nodes on him through historical semantic information and influence relationship, we input the heterogeneous neighbor local graph network with Smith as the ego-user, so as to predict his future behavior (whether to participate in the discussion of concert events). Specifically, we proposed HetInf, a heterogeneous network influence prediction model based on two types of nodes. First, based on the influence relationship of network nodes, we constructed a heterogeneous relationship graph composed of them and hoped to build a more accurate influence model. Second, we sampled ego-user neighbor subgraphs. Specifically, an innovative heterogeneous network sampling strategy, based on restart random walk (RWR) [17], is used to sample the topology features and the semantic features (including event topics and user interests) of the heterogeneous nodes in ego-user neighbor subgraphs. Subsequently, an end-to-end heterogeneous neural network influence model is built, the historical topic features of events and the historical interest features of users based on semantics is embedded using Word2Vec [18], the node representation through the node semantics is aggregated, and the heterogeneous graph neural network model is used to learn the node relationship of event-user heterogeneous network. Finally, we learned the influence of different neighbor nodes on ego-user node through the graph attention networks [19], so as to predict whether users will have the social behavior of participating in events in the future.
Summarizing, our contributions are given as follows: (1) We applied the heterogeneous graph neural network method to predict the influence of users at the micro-level in social networks. Specifically, we extend the deep learning method of homogeneity social influence networks and analyze the dynamic propagation mode of heterogeneous networks to infer more accurate influence network. (2) As respect to heterogeneous networks, we design a local sampling method in line with time sequence process, established the influence relationship between events and users, and applied an innovative end-to-end heterogeneous graph neural network model to more accurately predict users' social behavior. (3) Therefore, we tested two real large-scale social network data: Digg and Weibo. The experimental results show that HetInf exhibit significant improved accuracy when constructing a heterogeneous network compared with several state-of-theart baselines.
The rest of this article is organized as follows: Section 2 formulates social influence locality problem. Section 3 introduces the proposed framework in detail. Section 4 describes extensive experiments with two datasets, Section

PROBLEM FORMULATION
In this section, we introduced several related definition and then formally formulated the problem of heterogeneous social network influence locality.

Definition 1: R-Neighbors and r-Heterogeneous Neighbor Subgraph
A heterogeneous network G (V, E; O V , R E , A V ) with two types of nodes V, three types of links E, and node attributes A V is defined. In Figure 1B, O V includes user node U (round), event nodes E (square), and V U ∪ E; R E includes user-event relation R ue , event-event relation R ee , user-user relation R ue , and E R ue ∪ R ee ∪ R uu . A V is the attribute feature of the node, including the semantic attribute A S V (rectangle) and topology attribute A T V . For user u, its r-neighbors are defined to be Γ r is the distance (number of hops) from v to u, and v is different types of nodes in heterogeneous subgraph G. The r-ego heterogeneous subgraph of user u is the local heterogeneous subgraph induced by Γ r u and denoted by G r u .

Definition 2: Social Action
Social action refers to the behavior of users in the social network events, such as social network users retweet tweets (events) or publish papers (events). Formally, a social action can be regarded as the action of user u on event e at time t in the heterogeneous graph G u . We define social action as a binary problem. Action status S t ue belongs to (0,1). When S t ue 1, it means that user u has social action for an event e after time t; when S t ue 0, it means that no social action has occurred.

Problem 1: Heterogeneous Social Network Influence Locality
Social influence locality models the probability of social action when ego-user u i is influenced by neighbor nodes on his r-ego network G r ui ; formally, given a 6-tuple {u, e, t, G, A, S}, social influence locality aims to quantify the activation probability of user u i 's social action in response to event e after a given time t in G with attribute feature A and action status S as follows: where u i represents the ego-user, e t j represents the event e j at time t, G r u i represents r-heterogeneous neighbor subgraph, A t G represents the node attributes by time t, including semantic attribute A S , topology attribute A T , and S t G represents the action state of the subgraph node before timestamp t.
After determining the problem, we sample N 6-tuple samples through preprocessing data. Similar to the definition of DeepInf [14], we regard social influence locality as a binary classification problem and calculate the model parameters by minimizing the negative log likelihood objective method. We use the following objective with parameters θ: Similarly, we assume that time Δt is infinite, that is, we only predict user action outside the observation time window.

MODEL FRAMEWORK
Our goal is to design a heterogeneous graph network model based on the interaction between events and users, named HetInf, which aims to learn the dynamic preferences of individuals and the influence of heterogeneous neighbors in detail. Building the HetInf model needs three steps: 1) constructing heterogeneous relational networks with attributes; 2) sampling r-heterogeneous neighbor subgraph; and 3) building heterogeneous graph neural network model. Figure 2 shows the framework of HetInf.

Constructing Heterogeneous Relational Networks
Considering the heterogeneous network based on influence prediction user action, an intuitive way is to construct heterogeneous influence graph [20] to obtain the influence between nodes from the heterogeneous graph. We first obtain two types of nodes, including users and events. The event node can be regarded as a specific network event, such as a hashtag in social network dataset or a "story" in Digg dataset.
Then we establish the relationship between nodes according to the data characteristics, including the relationship between user-user, user-event, and event-event. Specifically, we use follow relationship and interact relationship as the user-user relationship; the user-event relationship between users and events can be determined by the user's historical behavior, for example, the user has participated in the event; and the event-event relationship can use co-occurrence association [21] or semantic association [22]. In this way, we construct the global heterogeneous relationship network G (Definition 2.1), as shown in Figure 2A.

Extract Node Attributes
For different types of nodes, we select two features as the initial node representation of the heterogeneous relationship network: Semantic features: Since the user's behavior is more influenced by the semantic information of the social event [9], the semantic feature A S v of each node was used to indicate the "bias" in the semantic level. For the event nodes, TF-IDF [23] was used to sample K keywords Ew i , i ∈ K in each e and to distinguish between different events. For the user nodes, the stopwords (nonmeaningful) was first removed, then the most frequent I keywords Uw i (i ∈ I) of a user was sampled to represent the user's interest, and finally, these keywords with timestamp was sorted to represent the semantic evolution of nodes over time. Formalization is given as follows: Topology features: In addition, inspired by the study in reference 14, the DeepWalk [24] algorithm was used to obtain Frontiers in Physics | www.frontiersin.org January 2022 | Volume 9 | Article 787185

Sample r-Heterogeneous Neighbor Subgraph
Generally, the graph neural network uses the feature information of the node's n-order neighbor (e.g., first older) for the aggregation process of node features, such as GAT [19]. Therefore, the breadth first search (BFS) strategy [25] used in the graph localization process will make the weight between users and events too large in hot events. For example, in popular events, due to a large number of related events, most of the neighbor nodes sampled by a user are event type nodes, which will lead to the event type nodes becoming the dominant influence, and ignore the influence between users. In order to solve this problem, an improved random walk strategy was used to comply with the law of information dissemination. The sampling strategy includes two steps: 1. Sample a fixed-length random walk sequence. We took each user u ∈ U as the starting point, utilizing the RWR [17] method to sample a fixed number of N r neighbor nodes. 2. Use the meta-path method to perform sub-sampling in the sequence of step 1. We used random walk probability and u-eu (user publishes an event, which is then forwarded by other users) and U-U (users directly forward through other users) meta-paths, sampled neighbor nodes with a fixed length of N < N R , and then used these neighbor nodes to construct a subgraph G u .
This strategy satisfies the law of information dissemination and helps avoid the problem of too little sampling of some types of nodes. Similar to the definition in DeepInf [14], a positive instance of a local heterogeneous subgraph was generated if a user has social action with an event after the timestamp t, and a negative instance was generated if the user is not observed in the watch window to be associated with the event.

Build Heterogeneous Graph Neural Network Model
In this way, 6-tuple (2.3) was used as a set of examples and a deep learning model was designed to predict the action state S t+Δt u of the ego-user. Our neural network model consists of three parts: a semantic feature aggregation module (shown in Figure 2c-1), topological feature aggregation module (shown in Figure 2c-2), and heterogeneous multi-attribute hidden layer aggregation module (shown in Figure 2c-3). In this section, the different modules are introduced to express the process of model building.

Semantic Feature Aggregation Module
A neural network module was designed to learn the deep association between user and event semantics. The module uses the node semantic attribute A S v (obtained in 3.1.2) and the local heterogeneous subgraph G u as input and realizes the aggregation function of semantic features through a neural network H S ′ (v). Specifically, we denoted the semantic feature as W i v , indicating the ith semantic feature of node v, and utilized Word2Vec [18] to pre-train W i v as x i . Inspired by the study in reference 13, the neural network structure of Bi-LSTM was used to capture the association between semantic features of nodes at a deeper level, and the average value of all hidden states was used to represent the general aggregation embedding as follows: where H S (v) ∈ R d×1 (semantic feature embedding dimension), v represents a node in subgraph G u , and x i represents the ith feature word of node v (refer to [13]). LSTM → represents the forward LSTM network, ← LSTM represents the backward LSTM network, θ represents the neural network parameter, and operator represents concatenation. Bi-LSTM can learn the potential evolution process of node semantics, leading to a strong expression capability [26].
Then the GCN [27] framework was used to aggregate semantic nodes of H S (5) to learn the influence relationship between different nodes, which is formally expressed as follows: where W ∈ R d×d , b ∈ R d are model parameters, g is a non-linear activation function, A is the adjacency matrix of G(u), and D represents diag(A). Since the number of subgraph nodes is fixed, A (G u ) can be calculated efficiently.

Topological Feature Aggregation Module
A topological feature can represent the importance of nodes in the network [28]. To aggregate topological feature embeddings of heterogeneous neighbors for each node, a layer of the GCN model was used for feature aggregation; in particular, the input vector consists of topological features A T v and node state S t v , which is inspired by the study in reference 14. Then the concatenated vector into the GCN layer was input to generate the node topological features hidden layer vector H T ′ (v). The formalization is given as follows: Eq. 6 is the same as Eq. 5, except for the input and model parameters. We used the GCN to aggregate the topological embeddedness of all heterogeneous neighbors. Obviously, GCN has excellent performance in relation aggregation capability [27].

Heterogeneous Multi-Attribute Hidden Layer Aggregation Module
We can obtain the semantic feature embedding H T ′ (v) and structural feature embedding H S ′ (v) of each node in the heterogeneous subgraph G u . To combine these features based on neural network module for each node v of subgraph G, the graph attention network [19] was employed. The advantage of this is that since different nodes have different influence contributions to the results, the multihead GAT learns the influence between different attributes of heterogeneous nodes.
First, we concatenated the hidden layer results of the previous steps, then, following the study in reference 14, we used multihead GAT and calculated the normalized attention coefficients H ′ f (v) as follows: where a is the attention parameter, a ∈ R 2d , W ∈ R d×d is model parameters, · T represents transposition, is the concatenation operation, α iv indicates the importance of node i to node v, and K represents the number of heads.

Output Layer and Loss Function
As shown in Figure 2c-3, the full connection layer (FC layer [27]) was used to output the two-dimensional representation of each node, then the current ego-user result was taken out, the ground truth was compared with, and formula 2 was optimized as the loss function used in our study.

EXPERIMENTS
In this section, extensive experiments were conducted with the aim of answering the following research questions: • (RQ1) How does HetInf perform vs state-of-the-art baselines for influence prediction tasks? • (RQ2) How do different components, for example, heterogeneous multi-attribute hidden layer aggregation module or semantic feature aggregation module, affect the model performance? How much performance gain is added to these modules?
• (RQ3) How do various hyper-parameters, for example, embedding dimension of keywords or the size of sampled heterogeneous neighbors set, impact the model performance?

Datasets
Following the previous studies [14], experiments on two public datasets were conducted to quantitatively evaluate our proposed model. The detailed statistics are presented in Table 1.
Digg [29]: The Digg dataset is a story collector, which contains the data of stories that were promoted to Digg home page within 1 month in 2009. For each story, this dataset collects the voting lists of all users, the voters' friendship links, and the timestamp of each vote. This dataset comes from the study in reference 30. In our experiment, we took the story as the "event" node and "voting" as the user action to build a heterogeneity graph. Due to the lack of text data in the dataset, the deep framework ( Figure  2c-2) of semantic information was not used in this dataset.
Weibo [31]: Weibo is the most popular social networking platform in China. This dataset contains 3,000,000 original tweets and retweets and comments of the original tweets from September 28, 2012 to October 29, 2012. At the same time, the dataset also contains the follow relationship between users who participate in these tweets. The dataset comes from in the study in reference 32. In our experiment, we extracted hashtags as events and built the heterogeneous graph of users and events, and the behavior of users participating in events (comment or retweet) is regarded as user action.

Data Preparation
In view of the imbalance in the number of active neighbors proposed by DeepInf, we set a threshold n > 5 (n is the sum of the number of active users and active events).Therefore, less active observation samples are removed, and thus, the sample characteristics involved in training are significantly related to social influence [31]. In order to solve the problem of data skew, the down-sampling method was used to control the positive and negative ratio of samples at 1:3.
Compared with the previous study, the preparation has the following differences: (1) For the choice of events, due to the shortcomings of the number of participants lack significance, we excluded some events with fewer participants and set the threshold of the number of participants to the top 30% of the distribution of the number of events so as to extract the total number of events in Table 1.
(2) In the Weibo dataset, we established the event-event relationship (edge) through the semantic correlation of events. Specifically, the historical text of each event was collected, the tweets text collection was sampled in the time window t, and then the semantic vectorization representation of the event was obtained by the par2vec [33] method. Then we calculated the cosine similarity between events; if the correlation threshold n > 0.7, the two events are semantically strongly correlated, and then we established the relation (edge) between events. (3) For the extraction of node semantic features (Figure 2c-2), we fixed the number of keywords of each node (for example, n 20). If the keyword samples of some nodes are insufficient (n < 20), we filled zeros to complete the vector to ensure the consistency of the input of the neural network.

Baselines
Support Vector Machine (SVM) [34]: A support vector machine (SVM) with linear kernel was used as the classification model. Specifically, the splicing of three features (including semantic features, topology features, and action status) was used as the input vector, and the problem was defined as two classification method. DeepInf [14]: Our framework was compared with the influence network model based on the graph neural network, which constructs homogeneity subgraph based on user relationship and predicts user node action in the future.
MvInf [15]: Our framework was compared with the state-of-theart graph neural network model MvInf, which introduces a multiview structure based on DeepInf and uses the complementarity and consistency between different views to enhance learning performance. The difference is that our proposed model is based on the common influence of events and users.
HetInf and Its Variants: In the heterogeneous multi-attribute hidden layer aggregation module, different graph neural network frameworks were used to distinguish the two methods: HetInf-GCN and HetInf-GAT. Separately, HetInf-GAT uses the GAT [19] method as a method to fuse node features, mainly using attention mechanism to obtain the importance between nodes, while the HetInf-GCN method uses the GCN [27] framework to aggregate the node features and calculate the node influence by learning the node relationship of the subgraph.

Hyper-Parameter
In our proposed method, we used DeepWalk [24] to embed the node topology features; the restart probability of this method is 0.8, and the output vector length is 64 dimensions. We used the ReLU [35] as the activation function sigma (Eq. 5, 6) and used the Adam [36] optimizer for training, with a learning rate of 0.005, and we set dropout 0.5. We used 50, 25, and 25% of the instances for training, validation, and testing respectively; the batch size of all datasets was set to 256. In order to accommodate more nodes, we set the total number of nodes in the subgraph to 100 (including two different types of nodes).

(RQ1) Performance Analysis
How does HetInf perform vs state-of-the-art baselines of influence prediction methods? Will users take actions on the events in the future? What are the advantages of the proposed model compared with baseline? In order to answer question RQ1, we applied four indicators to compare with the previous state-ofthe-art model (the same evaluation metrics as MvInf).
It should be noted that there are the following differences from the baseline method: in the semantic feature aggregation module, we used Word2Vec to embed each feature word into a vector with a dimension of 32. Specifically, the number of keywords for each node is 20. The output dimension of Bi-LSTM hidden layer is 128 and was used as GCN input (as shown in Figure 2c-2). The final output dimension of the GCN module is 128. In the topological feature aggregation module, the output dimension of DeepWalk is 128 and the state feature dimension was 2 (including action state and ego-state), so the GCN's input dimension is 130, and the output dimension of this module is 128 (similar to the DeepInf method). In the multi-attribute hidden layer aggregation module, for HetInf-GCN, we used two layers of GCN as the aggregation function of the module, in which the input layer of the first layer dimension is 256 and the output dimension of the second layer is 128. For HetInf-GAT, we used the GAT method, the input dimension is 256 and the output dimension is 128. Performance report of all models in Table 2 and Table 3 in which the best results are highlighted in bold.
(1) It can be seen from the results that in most cases, our proposed model is better than the baseline, especially in the accuracy and F1 value of microblog dataset (F1: 17.9%, Prec.: 35.7%), which proves that we have obtained the gain of accuracy after introducing heterogeneous networks and establishing event influence relations and verified the effectiveness of the proposed framework. (2) From the results of the Digg dataset, it can be seen that the heterogeneous graph neural network model with two types of nodes can also bring performance gain (F1: 0.6%, Prec.: 0.3%) (lack of semantic information of heterogeneous nodes), which proves that our proposed model can improve the prediction ability of user behavior only through heterogeneous social networks.

(RQ2) Ablation Analysis
HetInf is a deep learning model combining different modules, which calculates the influence between different nodes and  predicts user behavior by aggregating the embedding of different types of node attributes. To answer RQ2, we used Auc and F1 indicators as the standard for evaluating results, we conducted ablation studies to evaluate performances of several model variants which include: 1) No-NN-1 that cancels the LSTM method (NN-1) and then concatenates the vectors to embed the representation of the semantic feature to verify the impact of the semantic feature aggregation module on the results; 2) Only-Topology that uses heterogeneous topology encoding (C-2) to represent each node embedding (cancel C-1 module); 3) Only-Semantic that uses heterogeneous topology encoding (C-1) to represent each node embedding (cancel C-2 module); and 4) No-NN-3 that utilizes a FC layer to combine embeddings of different neighbor representation (replace NN-3). It should be noted that the Digg dataset lacks semantic information, so we only tested the results of 4) to verify the effectiveness of heterogeneous multi-attribute hidden layer aggregation module. The results of predicted AUC and F1 values are shown in Table 4 and Figure 3.
(1) The performance of Only-Topology is better than that of Only-Semantic, indicating that the position of key nodes in the network is more influential (such as opinion leaders). (2) The performance of Only-Semantic is better than that of Nolstm, indicating that Bi-LSTM-based content encoding is superior to "shallow" encoding like FC for capturing "deep" content feature interactions.

(3) HetInf (including GCN and GAT) is better than No-NN-3
shows that graph convolution network plays a role in capturing node type influence. (4) HetInf-GAT is superior to HetInF-GCN, indicating that graph attention network can obtain more influential potential relationships than GCN method.

(RQ3) Hyper-Parameter Analysis
To answer question RQ3, we investigated how hyper-parameters affect the predictive performance of the model. We conducted a parameter analysis on the Weibo dataset and used F1 value as an evaluation indicator. Specifically, we tested the impact of three key parameters: 1) semantic attribute embedding dimension; 2) head for multi-head attention; and 3) number of keywords. The experimental results are shown in Figure 4.
(1) Semantic Attribute Embedding Dimension: As shown in Figure 4A, when the semantic attribute dimension d varies from 16 to 256, the overall evaluation indicator is increasing because more dimensions contain more information. However, when the dimension reaches 128, the performance begins to decline, which is likely due to the result of overfitting. (2) Head for Multi-head Attention: Like DeepInf, we are concerned about the number of GAT heads in heterogeneous multi-attribute hidden layer. As shown in Figure 4B, the increase of heads brings benefits to performance, but after more than 8, the performance remains stable but it has a negative impact on efficiency. (3) Number of Keywords: The feature words of network nodes represent the semantic bias of nodes, which directly affect the prediction results. As shown in Figure 4C, when the numbers changes, it means that the amount of semantic information of network nodes increases and the evaluation improves at the same time. However, when the number of keywords exceeds a certain value, it will bring down performance, which is likely due to the noise caused by sampling too many non-  Frontiers in Physics | www.frontiersin.org January 2022 | Volume 9 | Article 787185 8 significant keywords. As can be seen from the figure, it is best to control the feature words between 20 and 40.

Social Influence Prediction
Social influence prediction is a fundamental problem in a social network analysis, which supports downstream tasks. At the micro-perspective [37,38], this problem is mainly modeled by analyzing user relationships. There are many different research directions, such as a user interaction influence analysis [39,40], network structure diversity analysis [41,42], topic influence analysis [10], and influence maximization [43]. Specifically, in the study in reference 44, the existence of social influence was proved by quantitative analysis of mutual influence. The study in reference 31 proposes social local network concepts, using user interaction and network structure to predict user behavior. The study in reference 45 uses topic level influence to model user influence. The study in reference 46 introduces a topiclevel influence propagation and aggregation algorithm to derive the indirect influence between nodes. In recent years, with the continuous progress of deep learning, many studies have introduced deep learning into social influence prediction to improve the prediction performance. A popular deep learning method is [14], which provides an end-to-end framework to predict social influence by learning the potential features of users. The study in reference 15 has improved the study in reference 14 to enhance feature representation and result accuracy with a multi-view model. The study in reference 47 proposes a social influence prediction model NNMInf based on neural network multilabel classification. The study in reference 48 introduces a deep neural network framework witch simulate social influence and predict human behavior. Compared with traditional methods, these deep learning models show better learning performance.

Heterogeneous Graph Neural Network
In recent years, we have identified a huge development of the graph neural network in deep learning technology [27,49], and the state-of-the-art model GAT [19], which represents the method of depth learning-based graphical representation as the graph neural network (GNN), the main idea is as follows: the first step is to calculate the feature representation of neighbor nodes, and the second step is to aggregate neighbors through message passing mechanism to obtain the feature representation of nodes [50].
Recently, the heterogeneous graph neural network has become the main branch of GNN. The main task is to learn the representation of heterogeneous nodes on the graph neural network, so as to adapt to the downstream tasks based on heterogeneous networks. The study in reference 13 realizes node representation of heterogeneous networks by aggregating features of different types of nodes in stages. The study in reference 51 proposes a heterogeneous graph neural network based on hierarchical attention, including node level attention and semantic level attention. Node level attention aims to learn the importance between nodes and their neighbors based on meta-paths, while semantic level attention can learn the importance of different meta-paths. The study in reference 52 proposes a heterogeneous graph neural network method for subgraphs, which trains a classifier to learn the neighbor average features of the random sampling graph of the relational "metagraph." The MAGNN [53] model which contains the node content transformation to encapsulate input node attributes, the intra-meta-path aggregation to incorporate intermediate semantic nodes, and the intermeta-path aggregation to combine messages from multiple meta-paths. GTN [54], which generates new graph structures by identifying useful connections between unconnected nodes on the original graph, can learn effective node embeddings on the new graphs in an end-to-end fashion. HGNN-AC [55] based on reference 53 proposed a general framework for heterogeneous graph neural network via Attribute Completion, including pre-learning of topological embedding and attribute completion with attention mechanism. These heterogeneous graph neural network representation methods enhance the representation ability of nodes and provide a more practical idea for downstream tasks.

CONCLUSION
In this study, we studied the problem of influence prediction based on a heterogeneous neural network, introduced a novel model HetInf that combines three neural network modules Frontiers in Physics | www.frontiersin.org January 2022 | Volume 9 | Article 787185 9 models to jointly infer the interaction between events and users in heterogeneous networks, and predicted the future behavior of network users. The local sampling method of heterogeneous networks was improved to capture the law of information dissemination, so as to obtain a more realistic user influence subgraph. Experimental results show that the influence prediction model can benefit from the heterogeneous network as well as joint learning embedding of users and events. In general, the empirical studies verify the effectiveness of our proposed model compared to the baseline methods.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material; further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
LG contributed to the core idea of the experiment design and analysis results under the guidance of BZ. HW assisted in experiment code and experiment analysis. HZ and ZZ analyzed the comparative experiment. BZ supervised the research, provided financial support, and provided the financial support and experimental equipment. BZ is the corresponding author. All authors discussed the results and contributed to the final manuscript.