Construction of the Social Network Information Dissemination Index System Based on CNNs

The information dissemination index system is an effective way to measure the dissemination of public opinion events in social networks. Due to the complexity, variability, and asymmetry of information, the construction of traditional information dissemination index systems demands excessive reliance on manual intervention, has large deviations, and is applied in a limited range. Such shortcomings cannot meet the requirements of constructing an objective, comprehensive, and highly credible index system. Therefore, we propose a method of constructing a multilevel and multigranular information dissemination index system with complex perspectives. In addition, we use the deep learning method of the convolutional neural network to extract the rich convolution features of public opinion events in the information dissemination process. Then, we train the weight, and it forms the corresponding weight of the information dissemination index systems. The experimental results prove that the method we use is superior to other methods and has better performance on the data set of a specific field.


INTRODUCTION
With the increasing convenience of Internet social networks, the cost and time of information flow are rapidly decreasing. How to accurately and quantitatively evaluate the dissemination of online public opinion information from the complex and massive amount of online public opinion information is a problem that needs to be solved urgently [1]. But nowadays, the understanding of the evolution mechanism of online public opinion is not deep enough. Many communication laws still need to be explored and summarized. The communication of online information involves a wide range of factors. The role of the constituent elements is interactive promotion and restriction. The main body and role of each stage factors are changing, making it difficult for managers to systematically assess the situation of information dissemination in an allround way [2]. Therefore, establishing a reasonable and easy-to-use information dissemination evaluation index system, quantifying the information dissemination per unit time, and evaluating and calculating the situation to help managers make accurate decisions have become a current key research topic.

RELATED WORK
The research on information dissemination situation assessment mainly includes two aspects: qualitative public opinion situation assessment and quantitative public opinion situation assessment.

Qualitative Evaluation
In terms of qualitative evaluation, the current research mainly focuses on the evaluation of online public opinion, scope of the event, clustering effect, and tendency of netizens to the event.
Warshaw developed a multilevel regression and posthierarchical model combining survey and population data so as to be able to detect specific online public opinion on lower-level geographic aggregation and evaluate the spread of public opinion [3]. Gil-Garcia developed two specific dynamic clustering algorithms from the framework proposed by previous scholars for clustering public opinion texts, which can automatically correct the clustering without re-clustering when the massive document data change [4]. Matsumura proposed an impact diffusion model to discover the network opinion leaders on the designated topics on the forum and the spread of network events [5]. Some studies proposed network topology and infectious disease strategies to explore the characteristics of information dissemination in social networks [6,7]. Cha made a detailed comparison of the three indicators for measuring user influence based on Twitter's massive data. Then, based on these three indicators, he also explored the dynamic performance of influence in terms of time and themes and gave a report based on the evaluation method of spreading public opinion of users [8].
Bermingham grabbed large-scale data on possible radical topics from YouTube and used social network analysis and sentiment analysis tools to dig out topics and their emotional tendencies and portray the characteristics of radicals. An analysis method of user emotional orientation in public opinion events [9].
In recent years, topic discovery technology and communication range evaluation technology for new Internet media, especially Twitter short texts, have also received extensive attention from scholars [10]. Li proposed a detection algorithm based on sparse self-encoding to study the mutual influence between nodes in social networks and analyze the information dissemination situation from the perspective of network topology [11]. Han proposed a topic representation model based on user behavior analysis and evaluated the importance of vocabulary in the process of information dissemination [12]. Zhang proposed a method based on vocabulary decomposition to rate the sentiment of Twitter short texts and used machine learning methods to automatically classify the sentiment of new texts to support the qualitative evaluation of user sentiments and responses of specific online public opinion events [13].

Quantitative Evaluation
In terms of quantitative evaluation of online public opinion situation, the research on online public opinion decisionmaking can be summarized from three aspects. The first is to design an index system for online public opinion situation decision-making. Based on the analysis of the spatial structure and evolution of online public opinion, corresponding indicators can be designed to represent the situation of online public opinion from different aspects. Designing a systematic, comprehensive, and scientific indicator system is the primary task of making decisions on the online public opinion situation.
Identifying key and representative indicators can not only reduce the complexity of the indicator system but also improve the accuracy of decision-making [14].
Zeng took public opinion risk as the core research object, starting from the source of risk and the internal and external performance of risk, constructed an early warning indicator system for network public opinion emergencies, and introduced the expert evaluation and AHP methods to determine the index weight [15]. Dai designed a three-tier security state decision-making index system from four dimensions: event sensitivity, communication, netizen attitude, and attention to public opinion and explained the statistical methods of each quantitative indicator [16]. Zhu proposed a mechanism to study the role of personal reputation and strategy in social network interactions [17] and also studied the information dissemination phenomenon of social networks in complex networks [18]. Shang regards the public opinion index system as an investment method, and under the guidance of portfolio theory integrates three sets of network public opinion index systems with their own characteristics into a dynamic integrated index system and uses radar chart analysis tools to evaluate the network public opinion. Comprehensive scores can also be used to analyze deep-seated reasons through online public opinion evaluation [19].
Zhang abstracted the key indicator system and logical relationship of the situation evaluation of network public opinion into a Bayesian network, then added some subjective evaluations, and gave a public opinion situation evaluation method based on the Bayesian network model. On the basis of constructing a public opinion index system with three dimensions, [20] Rong proposed a gray prediction method for network public opinion based on the variable-length mechanism of a data sequence. The prediction effect can be iteratively tested to find the optimal data sequence length to achieve the maximum good forecasting effect [21]. He chose the causal loop diagram to represent the mutual influence between the network public opinion indicators and constructed a network public opinion system dynamics model with 38 variables and four core subsystems targeting public opinion popularity [22]. Zhang used the principal component analysis method to refine the primary index system into a few comprehensive indicators without correlation and established an online public opinion early warning model based on the SVM machine learning method [23]. Nie measured and judged the influence of different users on the information dissemination situation in social networks from the perspective of user identity [24]. Huang used the PLSA model to extract the feature word space with no sentimentality, constructed the sentiment word space through word segmentation and TF-IDF function, and then integrated the inclination of all sentiment words based on the HowNet similarity algorithm to support the judgment of the sentiment trend of the network public opinion [25].
Most of the abovementioned scholars use subjective methods to construct indicator system models and use analytic hierarchy process, subjective assignment way, and other weight distribution methods to totally evaluate the situation of social network information dissemination, which have been verified by certain practical applications, but their focus is more limited. For more complex social network communication status, as well as the situational assessment of different types of events, the portability is inadequate. In the abovementioned former information dissemination index system construction and evaluation calculation method, the weight distribution of each fundamental index depends on manual operation, and the value is relatively vague, and sometimes the deviation is large. Considering the real-time nature of social network information dissemination data, it is required to construct a real-time, selfadaptive nonlinear system. The system should include a dynamic index weight learning mechanism and iterate with the constantly updated social network information dissemination situation data to form an absolute index system that can effectually evaluate the information dissemination situation.  The convolutional neural network (CNN) is currently one of the key research directions in the field of computer vision, based on deep learning. It performs well in applications such as image classification and segmentation, and its powerful feature learning and feature expression capabilities are increasingly being valued by researchers. The convolution operation is a multilayer feedforward neural network model. The network structure is shown in Figure 1.
Each layer uses a set of convolution kernels separately, which helps extract useful features from locally related data points. In the training process, the CNN learns through the backpropagation algorithm. The objective function optimized by this backpropagation algorithm uses a response-based human-like brain learning mechanism. The CNN imitates the biological neural network, adopting the core weight sharing network structure so that it can adjust the network model magnitude by adjusting the depth and width of the neural network.

Convolutional Layer
The convolutional neural network model has powerful assumptions about physical images, that is, statistical smoothness and local connection. It can validly reduce the learning complexity of the deep neural network model, making the network connection and weight parameters less, which makes it more than the same scale. The fully connected network is easier to train. It uses the convolution kernel to slide on the image and finally completes the process of calculating the gray value of all image pixels after a series of matrix operations.

Pooling Layer
The pooling layers can lower the dimensionality of the data by imitating the human visual system and using higher-level features to represent the image. The pooling layer can very effectively reduce the size of the matrix, that is, it can perform set statistical operations on the features of different positions in the local area of the image, thereby alleviating excessive sensitivity of the convolutional layer to image position and reducing the final fully connected layer. Parameters to speed up the calculation speed. The most commonly used pooling methods in practice are Max pooling and Average pooling. In addition to reducing model calculations and reducing information redundancy, they also improve scale invariance and rotation invariance of the model to varying degrees, effectively preventing overfitting. The improvement of various pooling methods also helps better realize feature compression and feature extraction, which greatly reduces the time required for model training.

Fully Connected Layer
A fully connected layer is consisted of several hidden layers in the CNN and usually appears in the last few layers. Each layer contains multiple neurons, and each neuron is fully connected to the neuron of the next layer, which is used to compare the characteristic structure and the structure designed in the previous section. Through the calculation of the layer and the layer, the feature obtained by the feature map is used as the input of the fully connected layer. The essence of the fully connected layer space is to linearly transform from one feature to another feature space. In addition, at the end of the CNN, we use different classification functions to calculate the results.

Dataset and Environment
We grab a total of 15 from different social platforms including: Weibo, Twitter, and WeChat official accounts; some news websites: Yahoo, Sina, Tencent, and World Wide Web; and various online information dissemination platforms such as forums and blogs. The relevant data of the event and each event is divided on an hourly basis according to the time window, forming a total of 6,354h of data as shown in Table 1. We randomly use 10 percent of them as the test set and the remaining as the training set. Each set of data is scored by 10 experts and 100 ordinary users of social networks in accordance with the evaluation criteria, which are used as tags for the data set. We performed this research in the following environment: CentOs 7.5, Intel(R) Xeon(R) Silver 4210, and Intel(R) Core(TM) i7-8750H CPU.

Construction of the Index System
The work of this module focuses on the diversified characteristics of the factors involved in the information dissemination situation, and each factor has a different impact on the information dissemination situation at different levels and different granularities. In-depth analysis and mining of influencing information from various perspectives, such as public opinion events, communication media, and public opinion audiences. The characteristics of the attributes of various factors in the communication situation and the law of their influence on the information dissemination situation have constructed a multilevel, multigranular, and multidimensional information dissemination situation indicator system. The three-tier indicators of the information dissemination trend index system are determined through research and use of the Delphi method, and the principal component analysis method is used to determine the main factors affecting the information dissemination trend by analyzing the correlation between different indicators, and finally, the information dissemination trend evaluation index is established, as shown in Table 2.
Among them, the first-level index public opinion event is analyzed from the perspective of the public opinion event, including the characteristics of evolution, post content, information dimension, and network structure. It is a class of indicators that describe the state of the public opinion event in the process of dissemination. It is mainly used to judge the communication stage of public opinion so as to analyze the communication trend. dissemination situation of the post is measured from the character level. Audience tendency analysis is an indispensable part of public opinion analysis, and tendency analysis also reflects the size, structure, and psychological condition of the audience from another perspective, and is an important component of public opinion dissemination. For the secondary indicators of audience sentiment tendency, it includes the word frequency of positive, neutral, and negative sentiment words as well as the average sentiment intensity of each event and the proportion of positive, neutral, and negative sentiment posts. The spread of posts is measured from the sentiment analysis level. The secondary indicators of user identity characteristics include user information such as the user's gender ratio, age distribution, education level distribution, and political affiliation distribution, as well as account registration time, whether it is identity authentication, and whether the account has a user name and avatar, and other account information.
Public opinion audience is analyzed from the perspective of users participating in a certain public opinion event, including characteristics such as emotional tendency and identity. Audience tendency analysis is an indispensable part of public opinion analysis. Tendency analysis also reflects the size, structure, and psychological status of the audience from another angle and is an important component of public opinion dissemination. Secondary indicators of media participation include the proportion of news reports in posts involving publicity events, number of news media reported, total number of news reports, total number of likes on the news, total number of retweets, and total number of followers of the news media. Leaders play an important intermediary or filtering role in the formation of mass communication effects, and they spread information to audiences to form two-level dissemination of information. An important role in two-level communication is played by the person in the crowd who is first or more exposed to mass media information and disseminates the information that has been reprocessed by himself to others. With the ability to influence the attitudes of others, they intervene in mass communication, speeding it up and expanding its influence. The secondary indicators of dissemination heat include information such as the number of posts, retweets, comments, likes, participating platforms, and users of the propaganda event. These indicators directly reflect the dissemination situation of the propaganda event in social networks.
The media aspect is analyzed from the perspective of the media of public opinion events, including indicators such as media participation, communication popularity, and regional distribution. It is an important standard for measuring the spread of public opinion events. Secondary indicators of media participation include the proportion of news reports in posts involving publicity events, number of news media reports, total number of news reports, total number of likes, total number of news retweets, and total number of news media followers. The secondary indicators of dissemination heat include information such as the number of posts, retweets, comments, likes, participating platforms. and users of the propaganda event. These indicators directly reflect the dissemination situation of the propaganda event in social networks.

Input
We calculated 25 three-level indicators separately and use the following method to standardize the original statistical data: This method first calculates the average value and standard deviation of the original data and then standardizes the data. The data processed in this way conform to the standard normal distribution and conduct the neural network input data. We combine the indicators of each event into a 5*5 two-dimensional matrix as the input of the CNN. The neural network training process is as follows.

Convolution
We used a two-dimensional convolution operation to filter and extract features. The input vector is combined into a 5*5 twodimensional single-channel grayscale image, and the calculation of the feature on the convolutional layer is as follows: c a σ (x) represents the output of the latter layer of features,W represents the weight of the kernel, p represents the feature map of the layer, and f is activation function Relu.

Pooling
In addition to reducing model calculations and reducing information redundancy, pooling operation improves the scale invariance and rotation invariance of the model to varying degrees, effectively preventing overfitting. The improvement of various pooling methods also better realizes feature compression and feature extraction, which greatly reduces the time required for model training. The calculation of features is as follows:

Output
The convolutional neural network filters the matrix through the aforementioned convolutional layer and pooling layer for feature screening, and then the obtained vector constructs a fully connected layer. After two fully connected layers, finally, the output is analyzed through the Softmax function to obtain the   prediction result, and the offset value and weight value are adjusted at the same time.

Calculation
In the model, we use the Relu function as the activation function of each layer and cross-entropy as the loss function.
In addition, the Adam algorithm is used for backpropagation optimization. Finally, the model is trained and the Captum interpretability tool is used to obtain the three-level indicator weights. The lower-level index weight is calculated by weighted summation to obtain the upper-level index weight: In this way, we calculated the first-level, second-level, and thirdlevel index weights of the index system. When we use the index system, we input the calculated value of the third-level index of an event to be detected. Therefore, we can obtain the corresponding second-level and first-level index values using Eq. 5 and obtain the propagation situation value of the event using model prediction.

Setting of Model Parameters
Convolutional neural networks can extract features at multiple levels and granularities through convolution and pooling operations and finally learn the propaganda posture features that can really distinguish the propaganda posture. We evaluate the needs and test the propaganda posture at different levels and granularities. The effect of different depth convolutional neural networks and different combinations of data input methods on the performance of the algorithm is to test the effect of convolutional neural networks on the quantitative evaluation of publicity situations. Based on the abovementioned experiments and tests, we have obtained a network structure with excellent experimental results. Two layers of convolutional layer and pooling layer are used, and two layers of fully connected layers, and finally, use the Softmax function is used for output, as shown in Table 3.

Experiment Results
We preprocessed the data and input it into the constructed convolutional neural network. After training, the results of fitting expert calibration are found to have very good performance and good convergence effect, as shown in Figure 2 and we used the BP neural network which also has a good performance as in Figure 3.
We also used linear regression, SVM, and other methods to calculate and compared the results with the effect of the convolutional neural network. As shown in Table 4, we used the following standards: accuracy, recall, precision, and F1-score to measure the performance of the models. The bold values represent the best effects. We found that the convolutional neural network has more excellent results.

CONCLUSION
In summary, the existing methods have several shortcomings, which are mainly concentrated in the following aspects: 1) the calculation of the information dissemination index system relies too much on manual evaluation; 2) the determined index system is relatively limited and one-sided; and 3) the weight value of the index system fluctuates greatly, and sometimes, it cannot accurately reflect the dissemination trend of information dissemination events. In response to these shortcomings, this study put forward a way for constructing an information dissemination index system based on the CNN. We use the convolutional layer for multidimensional and multigranular feature extraction and apply the pooling layer to quickly reduce the size of the information dissemination network and highlight the main features. Through the deep-network structure with several hidden layers of the CNN, we have realized the evolution of simulating expert experience to assess all-round indicators. In addition, it has adaptive features such as self-learning. Through experimental comparison, the calculated results perform better than the other mentioned models. However, this CNN model lacks the best parameter proof, and for specific information dissemination events on different topics, it lacks a more targeted index system construction. This is the direction of our future improvement and research.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
WH and LX contributed equally to this study and share first authorship. SL and LX contributed to the conception and design of the study and performed the experiments. XW and ZW grabbed and analyzed the dataset. SL and LX wrote the first draft of the manuscript.