COVID-19 case prediction using emotion trends via Twitter emoji analysis: A case study in Japan

Introduction The worldwide COVID-19 pandemic, which began in December 2019 and has lasted for almost 3 years now, has undergone many changes and has changed public perceptions and attitudes. Various systems for predicting the progression of the pandemic have been developed to help assess the risk of COVID-19 spreading. In a case study in Japan, we attempt to determine whether the trend of emotions toward COVID-19 expressed on social media, specifically Twitter, can be used to enhance COVID-19 case prediction system performance. Methods We use emoji as a proxy to shallowly capture the trend in emotion expression on Twitter. Two aspects of emoji are studied: the surface trend in emoji usage by using the tweet count and the structural interaction of emoji by using an anomalous score. Results Our experimental results show that utilizing emoji improved system performance in the majority of evaluations.


. Introduction
Almost 3 years have passed since the beginning of the worldwide COVID-19 pandemic at the end of 2019. The pandemic has been causing severe global problems in many aspects of life. During a pandemic, information availability is critical to helping people get through the hardships. Social media in particular has been a prevalent source of COVID-19 related information (1,2). A questionnaire survey of 1,003 US-based adults by Neely et al. (1) showed that 76% of the respondents relied at least somewhat on social media for COVID-19 related information, that 59% read COVID-19 related information on social media at least once per week, and that 63.6% were unlikely to check facts with a healthcare professional. A cross-sectional study among university students in Germany by Dadaczynski et al. (2) showed that 37.6% (5,302/14,092) of the respondents used social media occasionally or frequently to search for information on COVID-19 and related issues.
Social media has been shown to reflect social mental states. An analysis of Facebook posts by Settanni and Marengo (3) revealed that, overall, the expression of negative emotions positively correlated with anxiety, depression, and stress symptoms and that negative emoji usage positively correlated with anxiety symptoms. Park et al. (4) found that the use of words related to negative emotions and anger significantly increased among Twitter users with major depressive symptoms compared with those otherwise. Wald et al. (5) showed that the traits in the Big 5 Personality Index (6) (agreeableness, conscientiousness, extroversion, neuroticism, and openness) and those in the Dark Triad (7) (psychopathy, Machiavellianism, narcissism) could be predicted for social media users from their Twitter posts ("tweets") with rather good accuracy (area under the ROC curve of 0.736). frequency without explicit consideration of emotion analysis. Chew et al. (26) mentioned the use of Twitter data as a source of emotional responses toward COVID-19, but they did not perform emotion analysis on the data. Tran and Matsui (27) considered the tweet count of emoji-using tweets but did not perform a breakdown analysis of each emoji.

. Materials and methods
In this section, we describe in details of our approach to building and evaluating our framework for predicting COVID-19 cases given the past cases and COVID-19 related Twitter data. Our framework illustrated in Figure 1 consists of three major processes: Data Collection (Section 2.1), Anomaly Detection (Section 2.3), and Prediction System Construction (Section 2.4). In the "Data Collection" process, we collect COVID-19 related tweets containing emoji to obtain the social media emotion trends via tweet count and detect anomaly in those trends by analyzing fulltext tweets in the "Anomaly Detection" process. After preparing our necessary input data including the social media data and COVID-19 case data from official data sources, we build our COVID-19 case prediction system utilizing LSTM, a deep neural network for time-series modeling, and "ensemble of the best, " as described in the "Prediction System Construction" process. The framework is developed with and evaluated on almost 3 years' worth of data from January 2020 to October 2022.

. . Data collection
We considered Twitter as the social media platform for this study because of these two main points: 1) Twitter is a top popular social media platform in Japan and 2) Twitter promotes public social media engagements. According to Statista , Japan ranks 2nd in the number of Twitter users in 2022 after US. According to BigBeat , Twitter with more than 50 million users ranks 2nd after LINE which focuses on private engagements while Twitter promotes public engagements which can be easily participated by strangers. On top of that, Twitter provides API for researchers to access full historical data.
For the case study in Japan, the data used consisted of COVID-19 infection data and COVID-19 related tweets in Japanese. The COVID-19 infection data were publicly provided by the Japanese Ministry of Health, Labor and Welfare. The tweet data were collected using the Twitter API (version 2) with academic research access by matching a set of predefined keywords and emoji whose majority are facial expressions showing several kind of emotions: happiness, sadness, fear, anger, etc. (see Supplementary material) with the top 3 emoji are crying, sweating (may also be seen as raindrop) and smiley emoji. We chose the keywords based on observing Twitter trending phrases related to COVID-19 in Japan with four categories: general posts about COVID-19, posts about . /fpubh. .  Table 1. Since location-tag is off by default and mostly not turned on by users, to select tweets in Japanese, we used the "lang" parameter provided by Twitter API with the language code "ja." In the period from 2020/01/01 to 2022/09/30, we collected more than 20 million tweets in total. Beside the COVID-19 related tweets, we performed a count of all tweets that matched a predefined set of emoji. The total count was 8 billion for the same period. Figure 2 reveals a repetitive phenomenon: the reactions on Twitter form a wave shape corresponding to each wave of COVID-19. Tran and Matsui (27) hypothesized such a phenomenon on the basis of behavioral changes observed from Apple mobility trends reports. This phenomenon is also potentially due to excessive negative information exposure (13,14) and heightened risk perception of social media users.
We performed a preliminary cross-correlation analysis between the COVID-19 related tweet count for each emoji and the reported number of COVID-19 cases over consecutive overlapping 28-day windows from 2020/01/01 to 2022/12/31 and examined the highest correlation and its corresponding lag value for each window. We found there existed a correlation (mean: 0.5920, standard deviation: 0.2518) and a considerable variance of lags with mean of -0.44 days and standard deviation of 6.06 days. Over all the windows, there are 25.64% of windows with positive lag values, 37.31% of windows with zero lag values, and 37.05% of windows with negative lag values. Details of the lag values for each emoji are shown in the Supplementary material. The existence of correlation with considerable variance of lag, even with the top popular emoji, shows that capturing long-term relationship between the social media emotion reaction and the COVID-19 epidemic could be difficult using linear models, a method capable of capturing non-linear

29,484
The keywords were selected based on our observation of the trending keywords on Twitter which associate with reactions over COVID-19 news, COVID-19 case reporting, COVID-19 vaccination and government strong restriction policies. long-term relationship could be promising. For this reason, we adopt LSTM which will be introduced later in Section 2.4.
To compare to countries other than Japan, we collect corresponding data from South Korea, Thailand, Indonesia, India, and Germany. The COVID-19 cases data are from WHO. Tweet counts of COVID-19 related tweets containing emoji are also collected using Twitter API. For these countries, instead of using self-designed keywords, we select tweets annotated with "COVID-19" domain entity by "context annotation" feature of Twitter API to select COVID-19 related tweets. Twitter only disclose limited details that don't include specification of keywords used to do so. We also collect tweet count for Japan using the "context annotation" feature for comparison purpose. Due to privacy issues

. . Estimation of the long-term tendency of social media engagement in Twitter
To identify the differences in social media reaction between COVID-19 waves, we calculated the total tweet count for a period of 84 days (12 weeks): 7 weeks before, 1 week during, and 4 weeks after each peak.
We also compared the total tweet count of the COVID-19 related tweet collection with that of the "all tweet" collection and calculated the ratio:

. . Social media graph and anomaly detection
At certain moments in time, unexpected events occur that catch the attention of social media users. They become viral and spread rapidly over social media, leading to anomalous behavioral changes. Anomaly detection in social media has attracted attention from the research community (28), and several research efforts have demonstrated different findings representing the characteristics of social media evolution. Several anomaly types in social media https://help.twitter.com/en/safety-and-security/tweet-locationsettings "lang" codes: de (German, Germany), hi (Hindi, India), id (Indonesian, Indonesia), ja (Japanese, Japan), ko (Korean, South Korea), and th (Thai, Thailand). https://www.statista.com/statistics/ /number-of-active-twitterusers-in-selected-countries/ have been studied including anomalous nodes, anomalous edges, anomalous sub-graphs, and anomalous events. Intuitively, the state of reactions on social media can be represented as a graph that connects social media objects including users, posts, entities, and topics. The graph can then be used to analyze behavioral evolution. The graph continues to evolve as new users join in, new posts are shared, new topics are discussed, and new entities are mentioned.
In the work of Rossi et al. (29), large time-evolving graphs were analyzed for anomaly detection. They found that it is possible to identify interesting patterns and detect unusual structural transitions. In a large Twitter relationships network, they observed seasonality among the transitions. In particular, they found that users generally behave much differently over the weekends, as evidenced by an increase in the anomalous scores on those days. They speculated that the manner of tweeting differs between weekends and workdays. Motivated by their findings, we adapted their method for use in analyzing our Twitter graph, looking for clues to the factors that trigger anomalous behavioral changes on Twitter.
We performed the analysis using an evolving dynamic Twitter graph. We focused on capturing the temporal behavioral changes in interconnected social media objects (users, emoji, hashtags, domains, and entities) on the social media platform as it evolved during the COVID-19 epidemic in Japan. The social media were represented as a heterogeneous graph connecting the social media objects.
In this section, we introduce our approach utilizing a graph to represent Twitter data and our method for identifying the anomalous temporal behavioral changes in the Twitter objects. We expected that the social media network structural changes identified with our approach would complement the use of the tweet count to enhance our COVID-19 case prediction system.

. . . Twitter graph
To construct our Twitter graph, we considered five Twitter objects: user, emoji, hashtag, entity, and domain. These objects are connected to the event of a tweet being posted, commented upon, or retweeted (shared) and comprise the nodes of the graph.
• Hashtag: a way for users to include their tweets into a (trending) broad topic of conversation. • Entity: named entity; for instance, person, organization, location, time, which is automatically annotated by Twitter's named-entity recognition system given the user tweet's fulltext. • Domain: domain context of tweet as defined by Twitter.
The graph also has five types of edges (relations): • User → User: A user mentions another user in a tweet replying to the mentioned user's tweet or for tagging an additional user into the current conversation.

FIGURE
Chart of daily tweet count vs. reported number of COVID-infections in Japan (values were smoothed using -day moving average). Data suggest that number of COVID-related tweets has been correlated to some degree with progression of the epidemic in Japan since its beginning. An abnormal surge of social media reaction was observed on and October , which is close to Halloween. Otherwise, the social media reaction exhibits waves corresponding to the seven waves of COVID-epidemic in Japan.
• User → Hashtag: A user posts or shares a tweet containing a hashtag. The user wants to include their post into a certain broader topic of conversation. • User → Emoji: A user posts or shares a tweet containing an emoji. The user wants to express a certain emotion. • User → Domain: A user posts or shares a tweet belonging to a domain. • User → Entity: A user posts or shares a tweet mentioning an entity.
Formally, we define a graph G = (N, E), where N is the set of nodes and E is the set of edges in G. At time slice t, snapshot G t = (N t , E t ) is a subgraph of G with active edges E t connecting active nodes N t . For smoothly capturing graph state transitions instead of capturing the graph at separate and short time points, we use moving and overlapping time slices for taking snapshots of G. For instance, if the time slice is 7 days, snapshot G t is constructed with active nodes and edges over days [t − 6, t].

. . . Graph feature representation
Following the work of Rossi et al. (29), we estimate the latent features, which are called the "roles of the graph," and use them to describe the behaviors of the graph. A role transition model is used to capture the behavioral transitions of the nodes in each snapshot over time t.
Features. Two categories of features are considered: basic features and recursive features. In accordance with the definitions of Henderson et al. (30), the basic features are node degree, weight, and egonet measure, taking into account in-coming and out-going directions. The recursive features are aggregations of the basic features and previously discovered recursive features using sum and/or mean. We also applying feature pruning using logarithmic binning (30). Formally, we denote V = {V t } as the features obtained for snapshot {G t }.
Roles. By applying latent semantic analysis using non-negative matrix factorization (NMF), we find the latent feature space considered to be the role of each node. Nodes with similar role representations can be considered to be in one group with a common role in the graph. We estimate the role representations as a low-rank r matrix R t ∈ R n×r of the nodes of snapshot G t as R t F ≈ V t using NMF for reasons of interpretability and efficiency (29). The value of r is chosen such that r < min t (n t , f ), where n t is the number of active nodes at time t, and f is the number of discovered features. In total, we obtained R = {R t } as the role representations for all snapshots of G.
Role transition model. The estimated role transition can be used to analyze how the graph evolves over time. Given the high interpretability of NMF, we used it to estimate the role transitions. We estimate a transition matrix T such that R t−1 T ≈ R t .

. . . Anomaly analysis
The idea of anomaly analysis is that, if the prediction of the next role of a node diverges from the observation, the divergence value represents the anomalous score (29). The higher the divergence, the more abnormal the behavioral change of the node when interacting with the other nodes in the graph. Given a role transition matrix T estimated using we can predict the next role representation at time t aŝ The divergence of the predictedR t from the observed R t is considered to be the anomaly and can be measured as Where || · || F is the Frobenius norm. In this study, we set K = 14 days.
In proceeding to the next step of building our prediction system, we need to obtain anomalous scores for the emoji. For a given emoji identified as node i of the graph, we obtain the emoji's anomalous score as the divergence of the predicted role vectorR (i) t (row i th ofR t ) from the observed role vector R (i) t : The role transition matrix T captures the global transition of the graph in the period of K days. The predicted role matrixR t is, therefore, the expected next role representation by the global transition. Hence, the anomalous score computed by Equation 5 represents the anomaly of the node as how it diverges from the global transition.

. . COVID-case prediction system
Studies of the effects of social media on societal events including COVID-19 have led to social media information being used for constructing COVID-19 case prediction systems. In this study, we investigated the effectiveness of using emoji, which are commonly used in social media communications, in the prediction of COVID-19 cases.
A system of multiple LSTM models. Our COVID-19 case prediction system is constructed using an ensemble of long shortterm memory (LSTM) models. LSTM was proposed by Hochreiter and Schmidhuber (23) and is widely and successfully used in modeling sequential data (31). For each LSTM model in the ensemble, given an input sequencex t,l = {x t−l+1 , ..., x t } of length l, the model is trained to output the number of COVID-19 cases o * t+δ , i.e., at δ time steps since the last input time step. The input contains the observed number of COVID-19 cases o and an additional feature s, which is either the tweet count or the anomalous score of an emoji. Each LSTM model is configured with 4 layers and hidden size of 16 each layer. The operation of an LSTM cell, the building block of the LSTM model illustrated in Figure 3 can be described in the following Equations (6)(7)(8)(9)(10)(11)(12), where the vanilla LSTM cell is extended with a linear layer to map the high-dimensional hidden state h t to a single-value prediction o * t .
The gating mechanism, a specialized feature of LSTM, with input gate i t , forget gate f t , and output gate j t controls the information flow, and, thus, helps learn important long term memory captured in the cell state c t , which is intuitively beneficial for learning long term dependency between social media reaction and epidemic situation.
A multi-feature ensemble. Using the historical number of cases and one additional feature as input, we construct a prediction system that is an ensemble of LSTM models of different emoji features {emoji} × {tweet count and anomalous score}. Instead of constructing a complex model with the inputs being the number of COVID-19 cases and many additional features, we construct an ensemble of simple models, with each one focused on modeling the relationship between the number of COVID-19 cases and one predictor or one emoji feature.
A dynamic ensemble of the best models. Only the best models are selected to be used in the ensemble at a certain time step. We consider that a model is better than the others if its performance was better at the most recent time step. Let t = t 0 + t be the time step at which we want to make prediction y t given input From all the trained models {f }, the best model f * ,1 δ for time lag δ at time t is selected on the basis of and ERR(·, ·) is the mean relative absolute error described in Equation (17). The 2 nd , ..., m th best models f * ,2 δ , ..., f * ,m δ are also selected. The prediction at time t = t 0 + t is given by The parameterδ enables the smoothness of the prediction to be controlled. The higher the value ofδ, the longer the period of recent events that the system takes into account, resulting in smoother prediction. The lower the value ofδ, the more sensitive the prediction is to the most recent events. For each emoji feature, a set of r models is trained with different randomly initialized parameters, which results in the total number of trained models being |{f }| = |{emoji} × {tweet count, anomalous score}| × r. This enables models trained for the same emoji feature to be selected and used in the ensemble if they perform well with different sets of weights. We employed these dynamic ensemble parameters: Inputs including the number of cases and additional time series (emoji usage count, emoji anomalous score) are smoothed using a 7-day moving average. The predicted number of cases is therefore a 7-day moving average. We consider the 7-day moving average smoothing as an appropriate way to have stable analysis of the data and a mitigation of case fluctuations due to reasons including human errors, issues in local municipality's reporting mechanism, and the going-to-test timing of residents.
. /fpubh. . Training. The system is trained or updated by utilizing data assimilation so that previously trained models are updated when additional observations are available. For example, if time lag δ = 1, when the data at time t are observed, the model trained up to time t − 1 is updated or tuned using the additional data observed at time t. Memory length l is set on the basis of the corresponding time lag δ: Where ⌊·⌋ is the round down operator. Memory length l is calculated such that, the further into the future the system has to predict, the further into the past the system needs to look. The data until 2020/09/07 (before the 3 rd wave) were used for obtaining initial models with a maximum of 10,000 training epochs and early stopping using 5% of the data held out as development set, before the data assimilation stage. Each data assimilation run is carried out with 25 fine-tuning epochs.
Evaluation. The prediction error is measured by the mean relative absolute error (MRAE): Where y, o ∈ R n are the system prediction and ground truth, respectively, and n is the size of the evaluation window {t + 1, ..., t + n}, where the system predicts {y t+1 , ..., y t+n } given the input data up to x t . As we perform analysis for a long period where the domain of values for the COVID-19 cases changed dramatically through the course of waves, we select MRAE as the evaluation metric because of its popular adoption in timeseries forecasting studies and its advantage of evaluating outputs with large value fluctuation. Given an evaluation period d, and an evaluation window size n ≤ d, we compute an error value for each of the d − n + 1 consecutive overlapping evaluation windows using Equation (17). Then, in the later sections, we will report system performance with the mean and standard deviation of the errors for the evaluation windows, and illustrate system comparison with "better error x% of the time" indicating the number of evaluation windows, in percentage, where the preferred system S pref achieves better error than the referenced system S ref (Equation 18 Figure 4, it can be easily seen that the social media reaction in the subsequent waves ( 2nd -7th ) was noticeably reduced compared with that in the 1st wave. It was particularly smaller in the 6th and 7th waves. Despite social media reaction in general increasing by 167% in the 7th wave compared with that in the 1st wave, the attention on COVID-19 dropped to 37% from 64 to 88% in the 2nd -5th waves and most recently 49% in the 6th wave. This led to the total tweet count ratio of "COVID-19 related tweets" to "all tweets" (the "COVID-19 attention proportion") falling to 32% in the 6th wave and 22% in the 7th wave compared with 56-62% in the 2nd -5th waves. Overall, social media attention on COVID-19 vs. general topics on Twitter dropped by 38-44% from the 1st wave to the 5th wave and by 68-78% from the 1st wave to the 6th and 7th waves.

. . Anomaly detection analysis
As illustrated in Figure 5, the anomalous score aligned with the tweet count most of the time. This suggests that surges in social media reaction are accompanied by structural changes in social media networks. This was particularly the case for periods leading up to a wave peak, where we observed a surge in both tweet count and anomalous score. This means that, in addition to using the tweet count to capture the surface trend of social media reaction, we can also capture the magnitude of the structural changes in the social media network during such periods.
However, there were several situations in which social media reaction surged while the anomalous score did not follow, for instance, around 2020/03/02 and 2022/09/05. Furthermore, for the top 3 emoji (Figure 5), around 2022/02/21, the anomalous score for the crying face emoji was similar to that for the other two emoji, sweating and smiley, and did not align with the corresponding tweet count. Such rare occurrences would, however, have little effect on system performance due to the design of the prediction system as an ensemble of independent models.

. . COVID-case prediction
Our experimental results show that using additional emoji features improves prediction performance in terms of MRAE. As shown in Figure 6, in the evaluation period from 2020/11/16 to 2022/08/21 (+6, +13, +20, or +27 days depending on the evaluation window), using additional emoji features achieved better error 69.10-73.91% of the time. The improvement in term of relative error reduction ranged from 0.1 to 94.14% with a mean of 28.77% and median of 23.91%. As shown in Table 2, improvement was evident most of the time during the 7 weeks before the week of each wave.
As also shown in Table 2, the system performed better for the 4th wave compared with the 3rd wave, worse for the 5th wave, and the worst for the 6th wave. From another perspective, significant situational differences were observed for the 5th and the 6th waves. The 5th wave was characterized by the COVID-19 Delta variant, a significantly more fatal variant. The 6th wave was characterized by the Omicron variant, a much more infectious variant, but less deadly. Therefore, there was no declaration of an emergency for this wave.
The results of an experiment on system performance with the use of tweet count and anomalous score together and alone for each wave (Table 3) show that using both improved system performance in terms of the MRAE. Out of the 20 evaluations, the combination yielded improvement ten times and equal performance once, tweet count yielded better performance only eight times, and anomalous score yielded better or equal performance only twice.
Our system achieved competitive performance when comparing with Google Cloud AI forecasting system (32). As shown in Figure 7, in the forecast periods reported by Google Cloud AI forecasting system, our system could achieve better prediction error in 76.92-81.87% of the time. Their system was designed with consideration of a number of features including, for instance, per capita income, hospital patient experience rating, air quality measures, mobility index, and governmental policies such as restricting restaurants and school closure, but social media emotion was not considered. They published the "COVID-19 Public Forecasts" dataset containing the prediction outputs for U.S. and Japan. .

Results of other countries
In this section, we present the results of our method for Germany, India, Indonesia, South Korea and Thailand. Like Japan, these countries are the tops in Twitter usage and have their tweets conveniently collected by their primary spoken languages. Location-based filtering is difficult since Twitter users are turning off location sharing as their concerns of privacy issue. Due to Twitter API capping the number of tweets that can be downloaded (10M/month ), we don't present the anomaly detection analysis for these countries in this paper. Tweet count data as mentioned in Section 2.1 are collected using Twitter API "context annotation" feature. We also put the results of Japan as comparison.

. . Social media reaction on Twitter
As shown in Figure 8, the long term tendencies of social media reaction on Twitter among the 6 countries are quite similar. The reaction dramatically surged in the beginning of the COVID-19 pandemic and quickly made a steep drop afterward. In 2021, each country faced new waves at different timing and so the social media reaction raised again. Noticeably as also shown in Table 4, in 2022, relatively much lower level of social media reaction was observed in https://console.cloud.google.com/marketplace/product/bigquerypublic-datasets/covid -public-forecasts https://developer.twitter.com/en/docs/twitter-api/tweet-caps Frontiers in Public Health frontiersin.org . /fpubh. .

Social media (Twitter) reaction in Japan measured by total tweet count (Equation )
for each COVID-wave over period of weeks: weeks before, week during, and weeks after peak of corresponding wave. Values shown are relative to those for st wave. Social media reaction in th and th waves was obviously much less than that in previous waves. Additionally, total tweet count ratio ("COVID-related tweets" vs. "all tweets") dropped to % in th wave and % in th wave compared with -% in nd -th waves.
all countries. In general, social media reaction toward the COVID-19 pandemic dramatically surged in the beginning and gradually faded out overtime.
Though having quite similar tendencies, the magnitudes of the social media reaction surges are somewhat different among the 6 countries. Comparing to the beginning of COVID-19, Japan, Germany, and Thailand exhibit several times relatively higher magnitudes of the social reaction surges comparing to India (one time high-magnitude surge), and the two countries, Indonesia and South Korea, with relatively low magnitudes of the social media reaction.

. . COVID-case prediction
The results of COVID-19 case prediction for Germany, India, Indonesia, South Korea and Thailand as shown in Table 5 demonstrate that using emoji features yields better prediction performance in most of the cases and worse performance in only 3 cases out of 192 cases. Although with differences among the 6 countries appearing in the percentage of emoji usage in COVID-19 related tweets (Germany: 4.08%, India: 8.27%, Indonesia: 8.02%, Japan: 8.66%, South Korea: 2.84%, and Thailand: 4.60%), and the ratio of COVID-19 related social media reaction over general social media reaction (Figure 8 and Table 4), COVID-19 related tweets containing emoji can provide informative features contributing to the improvement of our COVID-19 case prediction system.
As seen in Table 5, we can observe several noticeable abnormal errors larger than 100%, for instance, India, Period 6 (Jan-Mar 2022), Window = 28 with 798.1% for the bare system not using emoji features B and 377.2% for the system using emoji features E. Based on Equation 17, the error value of 798.1% indicates that the predicted number of cases is about 9 times of the observed number of cases. In analyzing this situation, we see that, in the mentioned period, India was in an unprecedented epidemic wave when the raising and dropping of the number of cases were dramatic in a relatively shorter period of time comparing to the previous waves ( Figure 8). While the higher speed of raising number of cases could be attributed to a newer variant, the dropping speed was also higher. It took 21 days since 2022-01-26 (the peak of the wave in Period 6) for the (smoothed) number of cases to drop about ≈7 times, while for the same dropping ratio of ≈7 times, it took 45 days since 2021-05-09 (the previous peak), and 78 days since 2022-07-23 (the next peak). Even though our system equipped with emoji features managed to cut the error to 377.2%, this is still a considerable error. Similar situations are also observed in other countries. In these situations, the prediction error gets magnified at a greater rate as we try to predict further into the future with a larger window size. The system had a hard time to adapt to a dramatic change of situation where past data do not contain adequate information.

. Discussion
The results of our study on utilizing social media data for constructing a COVID-19 case prediction system suggests that using social media can be helpful for epidemic forecasting. In terms of prediction accuracy, our experimental results show that using both tweet count and anomalous score improved the MRAE of COVID-19 case prediction. In terms of practicality, the utilization . /fpubh. . of emoji as a means of shallow emotion analysis can be easily applied to multilingual social media platforms worldwide. In addition to the use of tweet count and anomalous score, a future direction is to investigate higher dimension representation of emoji, for instance, by utilizing EmojiNet (33), a dictionary of emoji senses, to represent emoji in a high-dimensional vector space of semantics.
As shown by the change in social media reaction over the long course of the COVID-19 epidemic in Japan, behavioral changes may differ remarkably wave-to-wave, which challenges our system's ability to adapt and perform well. The proportion of social media attention to COVID-19 vs. general topics on Twitter was only 32 and 22% in the 6th and 7th waves relative to that in the 1st wave whereas it was 56-62% in the 2nd to 5th waves. One of the major factors in these differences was governmental policies: the Japan government did not declare a state of emergency during three waves: the 2nd , 6th , and 7th . In the 2nd wave immediately after the 1st wave, most people may have realized that the COVID-19 epidemic was not over. Therefore, while COVID-19 was still somewhat mysterious, there was not much surprise when the 2nd wave hit. As a result, although there was social media reaction, it dropped more than half (Figure 4). When the 3rd , 4th , and 5th waves hit, the social media reaction proportion was around 30s%, similar to that for the 2nd . Our system performed well for the 3rd and 4th waves, but not for the 5th one. This is attributed to much higher morbidity in the 5th wave, to which our system could not adapt adequately. Even after learning from the change in the 5th wave, our system performed worse in the 6th wave, which was characterized by an even higher level of morbidity and a marked difference in government policy: a state of emergency was not declared. Our system was able to learn from that change and performed better for the 7th wave, which again came with another higher level of morbidity and no declaration of a state of emergency.
Systems for epidemic forecasting that are based on only machine learning and historical data may be limited and suffer from unstable system performance when the epidemic lasts long enough to be characterized by several waves and different governmental policies, public perceptions, and attitudes. Our experimental results suggest that maintaining performance in later waves of an epidemic is a challenge. Machine-learning-based systems may need more data in order to adequately learn about social changes and thereby maintain performance. Future work should consider situations that are difficult to characterize from past data for constructing prediction system based on machine learning.
If social media attention on an epidemic starts to fade, it may be helpful to look at social media signals other than those directly related to the epidemic as an aid to forecasting systems. Although social media attention in Japan on COVID-19 has declined over the course of the epidemic, social media activity related to general topics grew by 280% in the 7th wave compared with that in the 1st wave (Figure 4), a period of slightly less than 3 years. Although looking at social media signals other than those directly related to the epidemic seems to be a promising approach in term of data volume, it is challenging in term of data collection and processing. This is because data containing social media reactions to all kinds of topics and problems is unrestricted and difficult to control. Even though a restricted set of pandemic-related social media data is more focused and can directly help obtain valuable knowledge about the public issues during a pandemic, for instance, social health problems including stress, fear, and anxiety (13-15), a decrease in such social media data makes it more challenging to analyze those problems. Hence, expanding the scope of social media data analysis beyond pandemic-related topics while keeping high-quality analysis is necessary and challenging in dealing with a long-lasting pandemic.
Our results show the potential of using emoji as a proxy for public social media emotion analysis in predicting the progression of a pandemic, which suggests the potential of monitoring public emotion for the task. On the one hand, for per-patient monitoring, medical big data and wearable Internet of Medical Things (34)(35)(36) provide the ability to monitor the physical conditions of patients directly and aid them individually and privately in real time.
. /fpubh. . The values are mean and standard deviation (in parentheses) of MRAE of consecutive overlapping evaluation windows with starting date 7 weeks before week of corresponding peak. Performance was generally better when both emoji feature categories were used. The error values are shown in percentage with % omitted. Decoration: bold+underline -the 4th setting gets the best result over the other 3 settings, underline -the 4th setting gets better result than the 3rd setting, bold -best in the first 3 settings.
However, such data is valuable and may not be shared across regions without strict regulations. On the other hand, public social media data analysis can help with monitoring in real time the public in terms of critical aspects, for instance, emotions, which can support cross-region analyses. Emoji can be easily monitored across all public social media platforms in all languages at low cost and thereby support shallow emotion analysis. Though multilingual language models can also be adapted for emotion analysis, access to public social media platforms is limited by the access rate, making mass full-text access difficult, which poses a challenge for mass and deep emotion analysis. The different interpretations of the meaning of emoji make it difficult to use emoji to correctly interpret the true underlying emotion of social media users in general and toward COVID-19 in particular. On the bright side, studies showed that large similarity exists in interpreting emoji meanings in age-based and nationbased analyses. Gallud et al. (37), though a questionnaire, found that older people did not have a lesser understanding of emoji than young people, though, there were also results indicating the varying understandings. In the questionnaire, which asked for the meaning of emoji referred in Emojipedia , some emoji got answers with high accuracy (>80%), some got answers with low accuracy (<30%). The "fear" emoji (17.1% accuracy) was also answered as "surprise." Also in the work of Kutsuzawa et al. (38), they found that, for both young and middle-aged groups, in general, emoji were similarly clustered in Arousal-Valence space, but some emoji were interpreted differently among different age groups. Schouteten et al. (39) found that emoji meanings (pleasure-arousal-dominance dimensions) are largely similar in 5 countries (Germany, Singapore, Malaysia, UK, and New Zealand). Still, misunderstandings of emoji were also observed (40), for instance, "praying hands" misunderstood as "a high five, " "irritation, anger, and contempt" misunderstood as "pride face, " and "confused" misunderstood as "frustrated and sad face." Despite the results of large similarity in interpreting emoji meanings, deeper interpretation of the true underlying emotion is challenging. In a systematic review of http://emojipedia.org Frontiers in Public Health frontiersin.org . /fpubh. .

FIGURE
COVID-related social media (Twitter) reaction depicted by the ratio of weekly total tweet counts of "COVID-related tweets" vs. "all tweets" since January . The reported number of infections is plotted in weekly sum.    effectiveness of using emoji features in improving COVID-19 case prediction system, much deeper analysis of social media users' true underlying emotions which significantly affect their behaviors is a difficult challenge. This work has these limitations: • Even though Twitter is a super popular and influential social media platform, not a majority of the population of each of the mentioned countries have a Twitter account or actively use the platform. Therefore, the collected data are not coming from the whole population. While the data could be seen as influential social media signal as also shown in our results, it should be treated with caution when representing the general population's reaction. • Only COVID-19 related tweets are considered. As shown above in the data from the 6 countries, COVID-19 related tweets only constitute a small potion of social media. Arik et al. (32) showed that it is necessary to study factors, e.g., per capita income, hospital patient experience rating, and air quality measures, which affects the epidemic progression. It is then intuitive to expand the study to social media reaction over those topics, for example, economics, and climate, together with COVID-19. • Emoji is used as the sole proxy to capture social media emotion reaction. Even though our study showed positive contribution of emoji features, emoji usage is still relatively small with less than 10% given the data from the 6 countries. For larger coverage of social media emotion analysis, full-textbased analysis could be considered for platforms where fulltext access at large scale is feasible, especially when considering other topics together with COVID-19. Even though Twitter API provides Sample Stream API which can be used for collecting 1 or 10% random tweets, biases in the sampling method were reported (42,43) and exploitable (44). • Location-based social media data collection is difficult. It will even be more so in the future when privacy issues will be even more recognized and respected. This work uses language to collect country-based data, which is applicable to only some countries. For future utilization of social media data, location sharing policy should be more fine-grained managed, for example, letting users select a lower precision level of location to share only the city or state they are in.

. Conclusion
We have investigated the use of emoji as a proxy for estimating social media reaction in terms of emotion trends on Twitter with the aim of constructing a system for predicting the daily number of COVID-19 cases in Japan. Our experimental results showed that using emoji features improves system performance. These experimental results together with our analysis of Twitter data suggest that prediction of the later waves of an epidemic could be more challenging. The difficulty may be related to changes in both the epidemic characteristics (variants and their properties) and social media reaction. Future work should consider situations that are difficult to characterize from historical data for constructing prediction system based on machine learning. Data availability statement .

Author contributions
VT and TM contributed to the conception and design of the study and to the data collection. VT implemented the system, performed data curation, conducted the experiments, and wrote the first draft of the manuscript. TM validated the progress and results of the study via daily discussion with VT. All authors contributed to manuscript revision and read and approved the submitted version.

Funding
This work was supported with funding from the COVID-19 Program and the Future Investment Program of the Research Organization of Information and Systems, Japan.