Simulation of Information Spreading on Twitter Concerning Radiation After the Fukushima Nuclear Power Plant Accident

Information spreading on social media is a crucial issue to build a safe society. In particular, during emergencies, misinformation and uncertain information can lead to social disruption and cause significant damage to our lives. Here we built a retweet network from 24 million radiation-related tweets by 1.3 million accounts in the immediate aftermath of the Fukushima nuclear power plant accident in 2011. Then we simulated the information spreading on the network to explore ways to spread scientifically accurate information. Our simulation replicated the reality in which the number of scientific evidence-based tweets experienced a gradual decline while the number of emotional tweets increased. We also showed that increasing new direct retweets from the influencers could effectively spread scientific evidence-based information in our hypothetical simulations.


INTRODUCTION
In ancient and modern times, all sorts of uncertain information appear during and after disasters whether natural or human-made. It has been documented since Roman times that rumors have been used as a sort of weapon. During World War II in the 1940s, people were at the mercy of much uncertain information, and innocent people were harmed in places where they were not on the battlefield [1]. Uncertain information was also spread in many countries during the COVID-19 pandemic in 2020. The new term "infodemic" (it comes from information and pandemic) was created, and experts warned of the risks [2,3]. During the infodemic in 2020, since people were at home due to the lockdown, uncertain information came and went mainly on social media.
The same thing happened in 2011 when the Great East Japan Earthquake struck Japan. Damages caused by the earthquake itself, the tsunami and the collapse of buildings were enormous. Moreover, the accident at the Fukushima Daiichi nuclear power plant (1F), located 220 km from the Tokyo metropolitan area, affected many people extensively and long-lasting. Due to this accident, many people in certain areas of Fukushima Prefecture were forced to evacuate. At the peak time, 165 thousand people evacuated and still, 30 thousand people continue to live away from their hometown. During this decade, some people have died of illness or committed suicide due to the stress of being forced to suddenly leave their homes and live in new communities in temporary housing. These are called disaster-related deaths.
The 1F accident was the cause of the confusion [4,5]. After the accident, people shared information about radiations. Because radiation cannot be seen directly and scientific knowledge is necessary to understand accurate information about radiation, many people looked for various details regarding radiation on social media. This information was shared largely, especially in Twitter space in 2011, Japan.
Despite being a crucial case, there has been little in-depth social media analysis of Fukushima and the radiation aftermath of the disaster. To tackle the issue, Tsubokura et al. collected and analyzed the data from the Japanese Twitter space, using technical terminology to comprehensively describe the information exchanged about the 1F accident [6]. They showed that retweets (RTs) accounted for more than half of the information exchanged about the 1F accident. Furthermore, RTs from influential sources, known as "influencers," accounted for 80.3% of the total RTs, even though they occupy only 2% of Twitter accounts.
These influencers could then be broadly divided into three groups by applying the document vector analysis. The first group of people tweeted rationally which described the effect of radiation based on science-based facts. The second group of people tweeted emotionally and criticized the government and Tokyo Electric Power Company. The third group consists of news agencies and journalists who are related to mass media [6].
Hereafter we refer to the second group of people who tweeted with an emotional expression as Group B and the remaining two groups as Group A.
To empirically identify each group's impact and explore ways to spread scientifically accurate information about radiation efficiently, it is crucial to go ahead with Tsubokura's study [6] and discuss information spreading under the various scenarios. However, information spreading, which is also known as complex contagions [11], is difficult to understand, unlike a physical phenomenon whose motion is deterministically delivered by kinetics laws. Therefore, we developed a simulation using a modified voter model with a real RT network. Specifically, by using the same data set in Ref. [6], we simulated the information spreading on a real RT network immediately after the 1F accident, originating from the real influencers. As a result, we showed that Group A was influential during the first week after the accident. However, Group A lost its influence after a month, and the tweets from Group B cover the majority of discussion about radiation on Japanese Twitter space. These simulation results replicated the fact that the tweet from Group B spread more widely than Group A in real data analysis [6]. Here we performed simulations under hypothetical scenarios to respond accurately and quickly to social crises, especially in the age of infodemic.

Network Data
We used the same Japanese Twitter data as used in Ref. [6] related to the 1F accident and/or radiation from March 2 to September 15 in 2011 (i.e., the first six months after the Great East Japan Earthquake). The total number of tweets and retweets during the period was 24,287,299 from 1,397,941 accounts. From this data, we built a weighted directed network in which the nodes are Twitter accounts, and the links are RT relations. If there was an object in the retweeted_status in the original JSON data, we treated it as RT. Therefore, we only used direct retweets. The code for this pre-processing is available at the following URL (https:// github.com/likr/twitter-analysis2018/edit/master/scripts/). The link direction is from the RT origin to the RT destination (a tweet author), representing the information flow ( Figure 1A). Thus, the number of outgoing links indicates retweeted frequency, while the number of incoming links indicates retweet frequency.
We built the network the way mentioned above because we assume that if an account i retweets another account j, i agrees with and supports j's opinion. If i retweets j more than once, then i strongly agrees with j. Of course, it is also possible that i may retweet to refute j's opinion especially when they are quoted tweets. However, we made this assumption because as of 2011, only about three years since Twitter became widespread in Japan, and most people used RT to show their agreement.
During the whole period under the study, the number of nodes and links were 813,876 and 7,528,370, respectively, with the largest connected component accounting for 99% in total. When we look at the RT network weekly, the size of the network decreases over time as the number of tweets about 1F accident and/or radiation decreases ( Figure 2A). Degree distribution of the whole period shows a highly skewed distribution in both incoming and outgoing links. When compared with the lognormal and power-law distributions [27], the incoming link was significantly (p < 0.01) closer to the lognormal distribution. There was no significant difference between the two distributions for the outgoing link, and the power-law exponent is 1.87 for power-law distribution ( Figure 2B). This skewed distribution in number of RTs is similar to the result of an earlier work showing that the retweet and retweeted frequency follows a power-law function with its exponent being 1 [7]. A simple Barabási-Albert (BA) model-growth and preferential attachment process-generates its power-law exponent three analytically [8]. Our degree distribution and the power-law exponent are not exactly the same as in the previous studies. This could be limited to tweets about 1F accident and/or radiation in our research. We do not go into the detail of the mechanism in this paper but will discuss RT dynamics at another time. The number of links for most nodes is small, and the percentage of nodes with less than three links is 63.3% for incoming and 58.8% for outgoing links ( Figure 2C).

Model
There are a variety of opinion dynamics models [9][10][11][12]. One of the most representative and longest established ones is the voter model. The voter model is also called the Ising model in physics literature [13,14]. A simple voter model in which nodes have binary opinions with externalities is also well-known in economics [15]. In a voter model, the nodes choose one of their neighbors to mimic their opinions at each time step. A voter model's dynamics is not direct voting by all neighbors, but the majority will have a probabilistic advantage. A natural extension of the binary voter model is that a node chooses its opinion, i.e., state, from multiple opinions, called the Potts model [16]. There is another model that has binary opinions as in the voter model, but all of its neighbors directly influence each node [17,18]. In these models, nodes have a threshold and make decisions according to that threshold. Both in analytical solutions and numerical simulations, Watts has confirmed that the threshold and an average number of neighbors to determine whether a global cascade occurs [17]. Watts' cascade model was initially studied in networks without link directions and weights. Since then, the model has been largely extended to include the case with link direction and weights [19,20] as well as degree correlations [21]. Furthermore, Watts and colleagues have confirmed the "social influence" assumed in the model in an experiment using an artificial music market [22].
Running the simulation on real data gives us a new perspective. Karimi et al. ran Watts' cascade model on six real networks and found that the temporal network structure increases the cascade size [23]. Combining real Twitter data with simulations is also underway to analyze information spreading in what-if scenarios [10,24,25]. Takayasu et al. used SIR-like model and showed that the false rumor cascade size decreased in case that the timing of anti-rumor transmission was earlier than the reality [24]. Tripathy et al. ran the two models on a real network with about 50,000 nodes to propose an antirumor strategy on Twitter [25]. However, these simulations do not fully take into account the link directions and weights that the real network contains.
We built our opinion model based on the Watts' cascade model with three opinions that takes into account the RT dynamics. We consider a model that incorporates the strength of one's own opinions to update his/her opinions, because we assume that nodes are influenced by their neighbors and their own beliefs when they update their opinions. This own opinion corresponds to the threshold in Watts' cascade model.
Assume that each node i has one of the internal states of s i (t) {−1, 0, 1} at time t. Here, we assign s i (t) +1 to Group A and s i (t) −1 to Group B. The state s i (t) 0 corresponds to a neutral state that does not belong to either of these groups. Initially, all nodes are set at the neutral state of s i (t) 0, except for the influencers. At time t, node i receives an input from its neighbor node j that has retweeted. Here we consider the direction and the weight w ij (t) of the link. When a node i does not have an incoming link from a node j, then w ij (t) 0. A node i determines its next internal state s i (t + 1) based on the value m i (t), which is the sum of the inputs from its all neighbors ( Figure 1B): where n i is the number of links directed to node i (i.e., number of incoming links of node i), and a i is the parameter that describes the strength of node i's own opinion. Let us assume that a i is normalized 0 ≤ a i < 1, and that it does not change with time.
When a i ∼1, node i refers its own state, and no update occurs when s i (t) ≠ 0. On the other hand, when a i 0, s i (t + 1) is determined entirely by inputs from the i's neighbors. This corresponds to the situation that a node updates its opinion fully depending on its neighbors. Therefore, when a i is small, a node is more likely to change its opinion. A node i updates its opinion next time s i (t + 1) based on m i (t) with the following threshold.
Due to this rule, when a i > 0.5, once a node i has its opinion s i (t) ≠ 0, a node i is unlikely to change its opinion. Only when w ij (t) has a large value, a node i may change the sign of m i (t) and update s i (t + 1). We count the number of nodes in each internal state s i (t); V A (t) is the number of nodes with s i (t) +1 and V B (t) is the number of nodes with s i (t) −1. Finally, we evaluate R(t) V B (t)/V A (t) as the ratio of the Group B to Group A.

Network Data Assimilation Simulation
We simulate the proposed model on the real RT network as follows: (1) Set the target period We bring out the real network data for the target period t, where t denotes a week starting from t days after the date of the earthquake. To see R(t)'s gradual fluctuations, we use the overlapping time windows; t 1 from March 12-19 in 2011, and t 2 from March 13-20 in 2011. Finally, t 181 T is from September 8-15 in 2011.

(2) Set the initial conditions and influencers
For the initial condition, based on the previous study [6], we set the top nine influencers in each group who retweeted most during the whole period. Here we set s i (t) +1 as influencers in Group A. Also, we set s i (t) −1 as influencers in Group B. All remaining nodes were set to s i (t) 0.

(3) Update opinion with the model
We repeat the simulation with our model. Here we repeat 50 steps because the number of nodes in each state does not change anymore ( Figure 3). After 50 steps we count the number of nodes with s i (t) +1 as V A (t) and the number of nodes s i (t) −1 as V B (t), and calculate the ratio R(t).

(4) Shift the target period and repeat the simulation
We change the target period from t to t + 1 and repeat the simulation.

Real Data Case
In the previous study [6], the number of RTs from influencers belonging to Groups A and B were identified. Therefore, we used the values as a benchmark to replicate the influence of Groups A Frontiers in Physics | www.frontiersin.org June 2021 | Volume 9 | Article 640733  . This indicates that the number of retweets from Group B containing emotional expressions about radiation accounted for more than that of Group A containing more scientific description. In fact, around t 10 (i.e., March in 2011), Groups A predominated because many scientists and mass media repeatedly tweeted their evidence-based facts and people retweeted them. However, the number of tweets from influencers in Group A declined, instead, the tweets from influencers in Group B increased.
To check the validity of our simulation, we compared R(t), the ratio of the nodes in Group B to that in Group A. We compared the real number R real (t) from the data and simulation results R sim (t) for the same target period t. Because the value of R(t) fluctuates widely with t, we compared the 7-days moving average R(t) within t ± 3 days for R(t) to check the rough variation. Note that our real data of R real (t) and simulation data of R sim (t) are not directly comparable. Thus, one is the number of RTs, and the other is the number of nodes that belongs to each group. Here we assumed that the nodes belonging to each group have a certain probability of retweeting their belonging group. Because we do not have the data that each account belongs to each group, we apply this way to grab a gradual trend of each group. This method is reasonable to a certain extent because previous study [6] have considered the number of RTs as the influence of each group. Figure 4A shows the simulation results with a fixed a i value for every node. Since we never know a i for each node, we fixed the value a for simplicity. First, when a < 0.2, the sum of error E T t 1 (R real (t) − R sim (t)) 2 between the R real (t) and the R sim (t) is very large. On the other hand, the sum of error E decreases sharply around a 0.2 and becomes the smallest at a 0.31 ( Figure 4B). This result suggests that each node is influenced by its neighbors and has a certain amount of intention of its own. We will discuss the obtained value of a 0.31 in more detail: if a is greater than 0.5, nodes do not change their opinion once they are in either s i (t) +1 or s i (t) −1 by its definition in Eqs 1, 2. On the other hand, if a is close to zero, nodes easily change their opinion depending on their neighbors. Original Watts' cascade model is known that global cascades do not occur when the threshold value exceeds about 0.25, even smaller average degree [17]. In our simulation with real network data, which most nodes  have a smaller number of neighbors ( Figure 2C), the global cascade does not occur when a exceeds 0.3. Furthermore, when a is large, nodes hardly change their opinions once they are s i (t) ±1. As a result, the value of R sim (t) is less likely to change when a > 0.3.

Optimization of Fixed-Parameter a i
The above results for a fixed parameter a i showed that a 0.31 yields the simulation results most similar to the real results. However, fixing a i for all nodes is not practical to simulate real social phenomena. Therefore, we add noise to a i so that a i 0.31. Here we set a uniform random number [0, 0.62] for each a i . Figure 5 shows the mean and standard deviation of the values over ten iterations of the simulation with a i 0.31, changing a i for each node. We can confirm that the parameter a i with noise also replicate the real data.

Application of Simulation with Hypothetical Scenarios
Finally, we present the simulation results under hypothetical scenarios based on the simulation with a i 0.31. Comparing hypothetical scenarios will help us consider a new strategy to convey scientific evidence-based tweets instead of emotional tweets. Here we apply the following three scenarios ( Figure 6). We compared these three scenarios, with a 10% increase in RTs for nine influencers in each group. Since increasing RTs corresponds to adding new links to the network, therefore, in the actual simulation, we randomly added and changed the link connections, and compared them on average over ten iterations. Here we simulated six scenarios. In Figure 7, we compared the mean 〈R′(t)〉 which is the hypothetical simulation result, and the mean R(t) which is the original simulation result over the entire period for the six scenarios (1A)-(3B). When we apply the scenarios (1)- (3) only to the influencers in Group A, R′(t)/R(t)< 1, we can reduce the influence of Group B. Especially the scenario (1A), which is adding new node who retweet influencers in Group A directly, has the largest impact to reduce the influence of Group B. On the other hand, when we apply the scenarios only to the influencers in Group B, it yields R′(t)/R(t) > 1. When we compared the absolute value |R'(t)/R(t) −1|, the most impactful of these scenarios is (1A). Our result suggests that to increase the influence of Group A with science-based and less emotional tweets, increasing the number of nodes who retweet the key influencers directly is effective.

DISCUSSION
Although we cannot see radiation nor information directly, it continues to affect our lives. Here we have analyzed and simulated radiation-related tweets after the 1F accident in 2011. Over the first six months after the accident, scientifically based tweets decreased. Instead, tweets containing more emotional expressions began to spread among the radiationrelated tweets in the Japanese Twitter space.
To explore the ways to spread scientifically accurate information about radiation efficiently, we first built a weighted directed network from the tweet data. Next, we introduced a model of opinion dynamics where each node has its own intentions but is also influenced by its RT neighbors. When the strength of each node's opinion a i is 0.31, it best reproduces the real data. This suggests that each node is influenced by its neighbors, taking also into account its own opinion. Then we have introduced this model on the built RT network with real influencers in various hypothetical scenarios. The hypothetical simulation allows us to quantify what kind of RTs can increase a particular group's influence. Although our simulation setup is simple, it can provide suggestions on how to make the information more widely available, even on complicated RT networks.
There remain limitations of our research both in developing network and simulation models. For developing the network, we employed that direction of the link is from RT source to tweet author assuming endorsement in RTs. Using Twitter data in 2011, when Twitter is relatively new in Japan, we believe that this was an appropriate assumption to some extent since RTs played a major role in information spreading. However, especially nowadays that RTs are often used to argue for opposing views, the way of developing a network will need to be considered more carefully. For the simulation model, we assumed that each node has three discrete states (opinions) by majority vote deterministically. However, it is also clear that real opinions have shades and are not discrete in just three states. Moreover, people do not decide by simple majority vote. For example, the bounded-confidence model [26], which has continuous values in opinions and incorporates the opinions of people whose opinions are similar to one's own, is a strong candidate in the information spreading about a radiation-related issue such as this.
Our analysis shows that the role of influencers is crucial from the view of the network. In the future, it will also be important to analyze the nodes which are directly connected to the influencers. Also, more in-depth knowledge about networks, such as timevarying networks and analysis using multilayer networks, can provide a more accurate picture of information spreading. Not only in radiation information spreading, but uncertain information spreading has also been observed in political and vaccine information [28,29]. Behind these phenomena, it has been pointed out that the information that spreads easily has novelty and attractive narratives. Although we did not go into the text of these tweets in-depth, we hope to work with psychologists to analyze the tweets in more detail in the future and reflect the results in our simulations.
In addition to network analysis, it is essential to develop the simulations as well. By varying the parameter a i over time or fixing the a i of a particular influencer, we can expect more realistic simulations. For example, introducing a i proportional to the number of RTs (outgoing links of node i) is a strong candidate because it reflects the fact that influencers often act as opinion leaders and are unlikely to change their opinions. Also, in the hypothetical scenarios, we performed the simulation with increasing RTs from influencers this time. However, measuring the impact of the simulation with decreasing RTs from influencers in Group B will also give a new perspective to spread information from Group A. Another important direction is to examine the timing of the information transmission. For example, examining how many RTs from Group B can be reduced if the timing of information spreading from Group A is earlier than reality. Our simulation could be widely applied in information spreading social media, not only for radiation-related information, but also for e.g., delivering correct vaccine information.

DATA AVAILABILITY STATEMENT
Twitter data used in this study is available for purchase through NTT DATA and Twitter, Inc. To comply with Twitter terms of service, Twitter data cannot be publicly shared. Other raw data supporting the conclusion of this article (simulation codes and result data) will be made available by authors, without undue reservation.

AUTHOR CONTRIBUTIONS
All authors contributed to the conception of the paper. YS, HT YO, and KU designed the whole structure of the research. YS and YO constructed the network data. YS and HT developed the model. YS performed the simulations. YS and HT wrote sections of the paper.