Sentiment analysis for measuring hope and fear from Reddit posts during the 2022 Russo-Ukrainian conflict

This article proposes a novel lexicon-based unsupervised sentiment analysis method to measure the “hope” and “fear” for the 2022 Ukrainian-Russian Conflict. Reddit.com is utilized as the main source of human reactions to daily events during nearly the first 3 months of the conflict. The top 50 “hot” posts of six different subreddits about Ukraine and news (Ukraine, worldnews, Ukraina, UkrainianConflict, UkraineWarVideoReport, and UkraineWarReports) along with their relative comments are scraped every day between 10th of May and 28th of July, and a novel data set is created. On this corpus, multiple analyzes, such as (1) public interest, (2) Hope/Fear score, and (3) stock price interaction, are employed. We use a dictionary approach, which scores the hopefulness of every submitted user post. The Latent Dirichlet Allocation (LDA) algorithm of topic modeling is also utilized to understand the main issues raised by users and what are the key talking points. Experimental analysis shows that the hope strongly decreases after the symbolic and strategic losses of Azovstal (Mariupol) and Severodonetsk. Spikes in hope/fear, both positives and negatives, are present not only after important battles, but also after some non-military events, such as Eurovision and football games.


INTRODUCTION
For many years, the war in Europe has been just a dark memory.When on the 24 th of February 2022, The Russian Federation declared war on Ukraine, it came out as a shock for most people all around the world (Faiola, 2022).It was thought that the presence of NATO and the European Union would be enough to guarantee peace in a short time.Unfortunately, that has been not the case due to the reason that both parties are neither part of NATO nor the EU, but are both former members of the USSR, and the conflict is still going on at early 2023.
In war, the morale of the nations is one of the most important elements (Pope, 1941), since it is what pushes a country, most importantly a country that keeps fighting.In the case of a country defending its own land, the morale does not only regard the two-belligerent country but mostly the defenders.In fact, at first, the Ukrainian chance for success has been seen as tied to the support of western countries (Galston, 2022), the need that was also confirmed by the Ukrainian president himself (France 24, 2022).For this reason, the feelings of the western countries who support Ukraine could be a decisive factor in the future of the conflict.If the western audience would perceive the conflict as a lost battle, which, if dragged on, would have bad repercussions on their daily life and only cause more to Ukrainians, it could cause them to pressure their governments into stopping the support.On the other side, if there is the hope of winning the conflict, it is possible for the governments to keep guaranteeing active support to Ukraine and costly sanctions to Russia.
According to the Collins dictionary, hope is an uncountable noun and is described as "a feeling of desire and expectation that things will go well in the future" (Collins Dictionary, 2022b).Conversely, fear is defined as "a thought that something unpleasant might happen or might have happened" (Collins Dictionary, 2022a).As grammatical objects they may be uncountable nouns, however, the main purpose of this paper is to promote various text mining and sentimental analysis techniques to measure "Hope" and its negative counterpart "Fear" by using social media posts from Reddit.com -the social news aggregation, content rating, and discussion website.

BACKGROUND & RELATED WORKS
From a general point of view, "sentiment analysis" can be defined as the procedure of utilising important techniques such as natural language processing, text analysis and mining in order to extract and interpret subjective and human-related information.The source of information for sentiment analysis can be diverse e.g.written text or voice whilst the entities might be events, topics, individuals, and many more (Liu, 2020).Sentiment analysis is also a broader name for many other tasks such as opinion mining, sentiment mining, emotion analysis and mining (Nasukawa and Yi, 2003;Dave et al., 2003;Liu, 2020).Text data mining can be defined as the process of extracting data from structured and/or unstructured data mainly made of text (Hearst, 1999).Text mining can be utilised for different purposes and with many techniques like topic modelling (Rehurek and Sojka, 2010) and sentiment analysis (Feldman, 2013).Text-related sentiment analysis is a versatile approach that helps to automatically extract meaningful information from the written text and useful to pursue many different objectives such as to assess and monitor psychological disorders (Zucco et al., 2017), to evaluate human behaviours during the football World Cup 2014 (Yu and Wang, 2015), to detect emotions in general (Peng et al., 2021) or to use them to conclude on gender differences (Thelwall et al., 2010), or even to make predictions on the stock market (Pagolu et al., 2016) and measure heterogeneity of investors via their social media posts (Ji and Han, 2022).
Considering vast amount of social networks recently continue to expand with regards to number of users, and are capable of reaching more audiences from nearly all levels of the community, Social media has naturally become the main source of information for text mining and sentimental analysis purposes.Sentimental analysis has been used to interpret data from different social network sources the most obvious example of which is Twitter (Hu et al., 2013;Yu and Wang, 2015;Giachanou and Crestani, 2016;Ji and Han, 2022).In addition, other popular social networks have also been used as the data source for the sentiment analysis realted purposes e.g.Facebook (Ortigosa et al., 2014), Reddit (Melton et al., 2021), MySpace Thelwall et al. (2010) and even YouTube comments (Tripto and Ali, 2018).
Despite the social media being one of the most common sources of data, sentimental analysis has also found application basis for many more text corpora -to name but a few: movie (Thet et al., 2010) or product reviews (Haque et al., 2018), newspaper articles (Balahur and Steinberger, 2009), or emails (Liu and Lee, 2018).Many of the analyses mentioned above mostly focus on understanding if a text is positive, negative, or neutral as a classifier (Pak and Paroubek, 2010), and/or promoting utilisation of various scoring systems (Naldi, 2019).It is also possible to employ similar analyses to understand if text utilises subjective or objective language (Liu et al., 2010), or to interpret which emotions are conveyed (Yadollahi et al., 2017).
Having the vast amount of data containing multitude types of human emotions is not only highly exciting in terms of computational data analysis research, but also seen useful for the human behavioural research.In general, there are two main theories on how emotions are formed in the human brain.The first is the discrete emotion theory that says emotions arise from separate neural systems (Ekman et al., 2013;Shaver et al., 1987).In these seminal studies, (Ekman et al., 2013) recognise 6 basic emotions of anger, disgust, fear, joy, sadness, and surprise whilst (Shaver et al., 1987) recognise anger, fear, joy, love, sadness, and surprise.On the other hand, the dimensional model says that a common and interconnected neurophysiological system causes all effective states (Plutchik and Kellerman, 2013;Lövheim, 2012).In particular, (Plutchik and Kellerman, 2013) recognise anger, anticipation, disgust, fear, joy, sadness, surprise, and trust whilst (Lövheim, 2012) recognises anger, disgust, distress, fear, joy, interest, shame, and surprise.Creating statistical correlation and independence analysis approaches are also highly important to provide evidences for the aforementioned human behavioural studies.This paper aims to develop a novel lexicon-based unsupervised method to measure the "hope" and "fear" of the Ukrainian-Russian Conflict.Reddit.com is utilised as the main source of human reactions to daily events during nearly the first three months of the conflict.The structure of this social network -Reddit.com-allows for discussing about very specific topics (posting in specific subreddits), without short limitations on the number of characters that can be posted.This makes it easy to mine for opinions about the Ukrainian conflict, to get an idea for what people think about it and how hopeful/fearful they are.To achieve this goal, the top 50 "hot" posts of six different subreddits about Ukraine and news (Ukraine, worldnews, Ukraina, UkrainianConflict, UkraineWarVideoReport, UkraineWarReports) and their relative comments are scraped and a data set is created.On this corpus, multiple analyses are employed.We promote using a dictionary approach, which scores the hopefulness of every submitted user post.The Latent Dirichlet Allocation (LDA) algorithm of topic modelling is also utilised to understand the main issues raised by users and what are the key talking points.This research aims to fill the gap present in the literature regarding opinion mining, specifically for hope.The main analysis consists of mapping hope measured with the newly proposed method.In particular, first, the trend of hope over the time is monitored.It is later compared with some of the most important events which happened during study time frame.This ascertains how such events influenced the public perception of the conflict, and provides evidence about the validity of the proposed hope measure.Fear is measured and mapped over the same study time period.In order to measure both fear and hope, a dictionary approach is employed that promotes using the National Research Council (NRC) Word-Emotion Association Lexicon dictionary as a starting point.Furthermore, individual topics extracted via the topic modelling observations are studied to interpret whether there is a correlation with "hope" and what kind of relationship they present if this is the case.Sentiment analysis is also employed to track the popularity of individual leaders (Putin and Zelensky) and the Russian and Ukrainian governments.Finally, stocks such as Gazprom and indices (gas prices and Russian and Ukrainian bonds) are analysed to interpret whether there is a relationship between the developed hope score and the stock market.

Reddit Data
Reddit has been chosen since its structure allows to easily group submissions about a specific topic, and because, compared to other social media platforms, the success of content is less influenced by the success of the author.To gather data for the analysis, it was necessary to obtain it from Reddit.The best way to achieve this goal is to use the official Reddit API.To do so it is necessary to register as a developer on their website, authenticate, register the app, state its purpose and functionality.Once the procedure is completed, the developer can request a token which has to be specified along with the client id, user agent, username, and password every time that new data is requested.
Six subreddits were chosen for their relevance to the conflict: The script developed in Python crawls the top 50 posts for each of the subreddit and the relative comments.Subsequently, it combines the new gathered submissions with the previously collected ones.It then removes eventual duplicates using the submission id.For every submission, the subsequent information was obtained: • title (only for posts): the title of the post The data collection process started on the 10th of May and has been completed on the 28th of July.It was conducted daily around 3.00 pm UK time.More than 1.2 million unique observations were gathered within this time frame.

Pre-processing Stages
The data obtained through the collection process was not useful on its own.It had to be processed to be analysed and explored.First, some cleaning needed to be done.Not all the observations gathered would be useful.In fact, some of the submissions in the r/worldnews subreddit were not about the conflict.To eliminate the irrelevant ones, only the posts with the flair "Ukraine/Russia" had to be kept.The only issue is that flair is assigned only to "post" type submissions, but not to comments.
Luckily, the structure of Reddit, allows to use id and parent id to move upwards to the original post from every comment.Every comment is like a tree branch in a forest-like structure, with every post representing a single tree.Thanks to this principle, it was possible to extract the "ancestor id" of every submission and use it to assign a flair to the comments.This allowed to identify and remove the submissions without the relevant flair from the r/worldnews subreddit.
The next step would be converting all the words in each post to lowercase.Subsequently, we obtain score for a specific emotion for every submission.To reach this goal, the number of words related to the investigated emotion in every entry was counted.
Another useful information to be extracted is the polarity score.Using a different sentiment analysis approach, the "text" of a post or a comment would receive a score that ranges from -1 to 1 according to its sentiment.A score of -1 indicates a very negative meaning, while 1 indicates a very positive one.The score was extracted using the sentiment.polaritymethod from the TextBlob python module.Another method, sentiment.subjectivity,from the same module was also used that allows us to understand if the author is stating facts or if they are voicing an opinion.Subjectivity ranges from a score of 0, which indicates a very subjective text, to 1, which indicates a very objective one.
One of the problems with dictionary-based sentiment analysis, is that it arbitrarily favours long texts.In fact, with a higher wordcount there are more chances to find the relevant words.Furthermore, it increases the score cap for a submission.A one-word comment could have a maximum score of one, while a hundred-words comment could potentially score one hundred.To solve this issue, a new parameter called "w lenght " was created.It stores the emotion score divided by the length of the submission multiplied by 100.
Another improvement to be made regarded the weight of singular opinions.There are opinions which are more popular than others.On Reddit, it is easy to understand whether one post is popular by looking at the number of upvotes.To have a better understanding of the public opinion, it was relevant to weight the hope score to the number of upvotes.While being an improvement, simply multiply the "w lenght " score for the number of upvotes, would disfavour popular comments in unpopular posts.A very successful post would have a very high number of visualisations, comments and upvotes.A comment X, viewed by 100 people and upvoted by 10 (10%) would have a higher score than a comment Y, viewed by 10 people and upvoted by 5 (50% of viewers).To solve this issue, the number of upvotes needed to be weighted on the number of comments on a post, to obtain its relative popularity (opposed to the absolute one).A parameter storing the number of comments for every post ("sub in−post ") was obtained by counting the submissions for every "ancestor id".Finally, another parameter "w upvotes " was created.It stores the value that "w lenght " multiplied by the number of upvotes divided by "sub in−post ".
Hence, w upvotes becomes the emotion score that is weighted on its length, the upvotes and the relative popularity.The flow diagram of the general pre-processing process is depicted in Figure 1.

Measuring Hope and Fear
Overall interest in the conflict has been measured in two different ways: (i) the number of submissions and (ii) the popularity of the posts.For the former, data were grouped by each day, and the number of daily submissions was counted.This includes both posts and comments, giving a good idea of the engagement trend.The latter studies the daily average number of upvotes for each post.Comments were excluded since a popular post is likely to host many comments with just one upvote, which would significantly lower the Complementing the aforementioned second method with the first one is very useful to give a proper idea of the general interest trend.The number of posts could have been influenced by a small number of users who are somewhat involved with the conflict, while the public might not be this interested.This can be tested by looking at the popularity of the posts.In fact, popular posts have many upvotes.To reach them, submission needs to have the likeness or the attention of a big group of users.
The main goal of this dissertation is to map hope in western public opinion for the Russo-Ukrainian war.There is a gap in the literature regarding this specific issue.There is, indeed, no scholarly accepted way to automatically measure hope.
There are many ways to tackle sentiment analysis, like machine learning or dictionary-based approaches.The first one would have required labelling a dataset, saying what is hopeful and what is not.To properly do that, linguistic expertise is a requirement.On the other side, using a dictionary-based approach would allow using scholarly accepted dictionaries.Hence, this paper concerns a dictionary-based approach.
Two issues had to be addressed to complete a dictionary-based analysis: that are linguistic and technical ones.At this point, we ask several important questions: What is hope and how do we measure it?According to the Collins dictionary, "Hope is a feeling of desire and expectation that things will go well in the future".Picking apart this definition helps to understand what are the elements that construct hope.The keywords are "feeling", "well" and "expectation in the future".A feeling is something inherently subjective to the person who feels them.Well, in this case, indicates a positive outcome.The expectation is "something looked forward to, whether feared or hoped for" and it is a synonym for anticipation.
Since there is no "hope" dictionary to the best of our knowledge, one had to be developed.As a starting point, the NRC sentiment and emotion lexicon was used.The NRC Emotion Lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive).The annotations were manually done by crowdsourcing.Among the emotions catalogued in this dictionary, there is "anticipation", "positive" and "joy".According to the previous definition, something to be hopeful needs to be subjective anticipation of a positive outcome.Hence, the three dictionaries were cross-referenced to find the words that showed "anticipation" and at least one between "positive" or "joy".
Thanks to this procedure, a "hope" dictionary is developed.The lexicon respects two of the three parameters: "anticipation" and "positive outcome".To satisfy the third one, all the Reddit submissions were analysed through the textblob.subjectivityfunction.It gives a score that goes from 0 (not subjective) to 1 (very subjective).To respect the three parameters, only the submissions that present a minimum score of 0.5, are to be analysed.
Once the dictionary was developed, it needed to be implemented.Every submission is characterised by a "text" column, which contains the message sent by the user.The script counts how many times words present in the "hope" dictionary are also present in the "text".In this way, a raw hope score, notated as hope score , is obtained, which is refined as described in the "pre-processing" chapter of the paper Fear was measured in the same way as hope.It is dictionary based and the score it is obtained by counting the fear-related words in every submission.The utilised dictionary was the same NRC one which is used to obtain "anticipation", "joy" and "positive" words.The f ear score is calculated as

Leader and Country Analysis
To obtain Leader analysis data, two new databases were created.The first one had only the submission containing the name "Zelenskyy" or its variations "Zelens'kyj" or "Zelensky".The second one instead included only observations which presented the name "Putin".Differently from the other analysis, hope and fear was not analysed, but the focus was on the sentiment polarity score.The polarity method from TextBlob was employed.It gives a score that ranges from -1 to 1, with the former representing a negative opinion, while the latter showing a positive one.After both databases were grouped by day, the mean daily polarity score was computed.
Similar to the Zelenskyy vs Putin analysis, two new databases were created.The first one included only submissions which contained the name "Ukraine", while the second one only had only observations which presented the name "Russia".Subsequently, the polarity score was measured using the TextBlob polarity method.Then, observations were grouped by day and the daily average polarity score was computed.

Stock Market Analysis
After collecting historical prices on six different stocks and financial titles (UK oil & gas, Russian Ruble and US Dollar exchange rate, the price of gas and the price of crude oil), they were joined to the "daily" database.Said database contains the weighted average daily value for hope and fear.
We developed a linear regression model having the price of the ticker as the dependent variable and either the average weighted daily hope score or the weighted average daily fear score as the independent one.Then for each data set, we run this linear regression model and calculated the corresponding parameters for each modelling.

Topic Modelling
The aim of this analysis is to understand what the gathered submissions are about through topic modelling.Topic modelling is an unsupervised machine learning technique that allows us to organise, understand and summarise large bodies of text.It can be described as a method for extracting meaning out of the textual data by extracting groups of words, or abstract topics, from a collection of documents that best represents the information in the collection.More specifically, this technique returns a probabilistic distribution of different topics of discussion, where each topic is associated with a given document by a certain likelihood score.A document could be about different topics at the same time in different proportions.
We first created a corpus and dropped less frequent terms in it.Now that the text data have been processed, the optimal number of topics (K) is estimated.Using the searchK() function, the different distributions of K (from 2 to 10) are elaborated, so that it is possible to interpret the results and make a guess on the optimal number of topics into the model.To find the optimal number of topics, it is necessary to plot the distributions of K topics discovered according to various goodness of fit measures such as semantic coherence and exclusivity.Semantic coherence measures the frequency in which the most probable words in each topic occur together within the same document.Exclusivity on the other hand, checks the extent to which the top words for a topic are not top words in other topics.Coherence measure how a topic is strongly present and identifiable in documents, whilst exclusivity measures how much the topic differs from each other.The goal is to maximise both whilst keeping likelihood high and residuals low enough.Then the distribution of the topics in the document is examined to see if there is a prominence of one topic over the others or if they have similar distributions (bad sign).Subsequently, a word cloud for every topic is created.It shows in a graphical cloud all the top words, with size changing according to the relative frequency of the words.Using the labelTopics() function, the words that are classified into the topics to better read and interpret them are inspected.This function generates a group of words which summarise each topic and measures the associations between keywords and topics.The most representative documents for each topic are then extracted.This is useful because it helps to give a more concrete idea of what each topic is about, using a real review as an example.The relationship between metadata, and topics is investigated.It is done defining the correlation model applying the estimateEffect() function.This function performs a regression that returns the topic proportions as outcome variable.The output of the function has the aim to demonstrate the effect of the covariates of the topics.To conclude, the correlation between topics is studied.

Hope-Fear Analysis
Our Hope-Fear analysis starts by measuring the public interest about the war and their intention to share posts in social media as shown in Figure 2. Overall social media interest during the conflict has been slowly but steadily decreasing for the whole analysed time window.With an average of 4335 daily submissions, in the first days, there were plenty of submissions, with a peak of 6993 posts in one single day on the 16th of May 2022.In the last part of the explored time, numbers are lower, with a negative peak of only 1080 submission in one day on the 22nd of July 2022, 5919 less than its maximum.
When we evaluate the daily upvote rates in Figure 2, differently from the above analysis, there are no significant changes in the trend of number of upvotes over time.The daily average itself is very volatile, but the trend remains stable.This could mean that while the users are still receptive and supportive towards the Ukrainian conflict (they keep upvoting the most important posts), they are less engaged, posting and commenting less.Thanks to this steady trend in upvotes and number of posts in each day, we calculated daily hope score by using the expression given in (3).As it is possible to observe from the graph given in Figure 3, the hope score during the analysed time-period is decreasing and finds a nearly-steady state after the half of the observed period in terms of its running mean visualisation.After the initial big drop, the score seems to stabilise on a lower value.This seems to reflect what happens during the war.In fact, the big drop happens around the fall of Azovstal (Mariupol) and Severodonetsk.Successively, it mirrors the "phase two" of the Russian offensive, with very slow and steady trend of hope score.This is also reflected by the fact that central 50% of the observations of the hope score are in a range of 0.054, while the total range is 0.264, as it is possible to see from the descriptive statistics in Similarly, by using the expression developed in (4), we calculated the fear score for the same time period.Despite being pretty volatile, fear remains stable for the whole analysis just after inital couple of days.This is an interesting observation, especially when compared to hope, which decreases in the same time period.Hope-Fear results are slightly negatively correlated, with a Pearson correlation index of -0.986.Here, in order to clearly interpret this phenomenon, we plot running means of Hope and Fear on the same axes below in Figure 3.

Validation of Hope/Fear Scores
In order to validate and better visualise the proposed hope/fear scores, we investigated 18 important events within the experimental period.To reach this, observations were grouped by day and the mean hope score was computed.The overall mean of the hope score was also calculated and a new column which contained the overall mean -each day average was created.The said important events chosen for the validation analysis are given below: 1. May 9 -failed Russian Donetsk River crossing.Ukrainian sources declare that during the crosses, 70 heavy Russian units were destroyed or lost.
2. May 13 -American-Russian talks.Lloyd Austin (American secretary of defence) and Sergei Shoigu (Russian minister of defence) held telephone talks for the first time since the start of the invasion.
3. May 15 -Ukraine won the Eurovision song contest thanks to an overwhelming popular vote.Stefania by the Kalash Orchestra won with 192 votes from the jury (4th place) and 439 from the televote.Second place went to the United Kingdom with 466 total votes.
4. May 17 -Azovstal, the steel factory of Mariupol is lost.It was the last stand of the Azov Battalion, a controversial group, which contained many of the best trained Ukrainian soldiers.This deprived Ukraine of a strategically important port, many soldiers and allowed the Russians to unify the front.
5. May 27 -90% of Severodonetsk is destroyed.The city is of big strategic importance, since it could allow the Russian to encircle many Ukrainian units in Donbass.6. May 29 -First visit of Zelenskyy outside of Kiev.This visit had the purpose to show that the president was not afraid of Russia taking him out.
7. May 30 -Russian troops enter Severodonetsk 8. June 5 -Ukraine is eliminated in the World Cup qualifiers, after losing 1-0 to Wales, with a goal scored by Gareth Bale.9. June 12 -Ukrainian supplies and planes destroyed.10.June 16 -sinking of a Russian ship.The Pastel Vasily Bekh tug was sunk near snake island in the black sea.
11. June 17 -Putin speech at economic forum in St. Petersburg.
12. June 22 -Ukrainian drone strike on a Russian oil refinery.
13. June 26 -14 missiles hit Kiev, damaging several buildings and a kindergarten.14.July 6 -Russian duma prepares to go into war economy, which would allow to order companies to produce war supplies and make workers work overtime.
15. July 7 -Zelenskyy gave a speech on the effectiveness of western artillery.Furthermore, a technical pause from the Russian offensive started, with the aim to regroup.
16. July 14 -start of the volunteer mobilisation, which requires by the end of the month, 85 federal areas to recruit 400 men each.17.July 16 -US house of representative approves a bipartisan bill that would grant $100 million in funds to train Ukrainian pilots to fly US fighter jet.
18. July 23 -4 Kalibr missiles hit Odessa.Of those 4, 2 were intercepted.The other 2 according to Russian sources destroyed a warship and a warehouse containing missiles.
The graph in Figure 4 shows how much above, or below average hope scored during the analysed period.Many of the spikes, both negative and positive, coincide with real world events which had an impact on the war or on the morale of the western public opinion.Some of the positive events include but are not limited to: the Ukrainian victory in the Eurovision contest (3), financial help packages from the United States (17) and the sinking of Russian ships (10).Negative ones include but are not limited to the loss of Azovstal (4), the fall of Severodonetsk (5) and the elimination of Ukraine from the World Cup 2022 (8).
As it is possible to observe in Figure 4, most of the biggest positive spikes are concentrated in the first days, when the phase 2 of the war had recently started.After the fall of Azovstal and Severodonetks, a slower and more intense phase of the war starts.Russians advance slowly but steadily.This is also reflected in the graph, where we can observe few spikes, and many observations being below average for the whole duration of June.In July there was more movement, in fact the United States developed a plan of military and financial aid to Ukraine.Furthermore, Turkey managed to broker a trade deal between Ukraine and Russia, which would allow Ukraine to export grain, avoiding famine in many countries (mainly in Africa).At the same time, Russian advance keeps preceding recklessly, as shown by the negative spikes at the end of the month.

Country-Leader Analysis
In this case of the experiments, we try to measure public interest in countries (Ukraine -Russia) and leaders (Zelenskyy -Putin).As previously stated, the metric for popularity refers to the sentiment "polarity".The first and most obvious consideration that emerges from this analysis presented in Figure 5 is that Zelenskyy, the president of Ukraine, presents a higher sentiment than Putin, president of Russia.As it is possible to notice, Zelenskyy is consistently more popular than his Russian counterpart, for the whole analysed period.In fact, the average polarity score for the Ukrainian president is 0.097, 2.6 times more than Putin, who scores a mere 0.037.Despite being less popular, the Russian president is more interesting to the Reddit community than Zelenskyy.In fact, his name is cited 30663 times in the database, 7.2 times more than his Ukrainian counterpart, who is cited only 4055 times.
Another interesting point is that despite being relatively volatile, the trend seems to be consistent during the analysed period.None of the two leaders present an increase, nor a decrease, in popularity.Zelenskyy shows a higher volatility than Putin, but this is likely attributable to the smaller sample size.
The small sample size also causes the big outliers in the Zelenskyy graph.For example, on the fourteenth of July 2022, the Ukrainian president shows a polarity score of -0.31, 0.128 below the average score.There are only 49 submissions naming Zelenskyy on that day.One of the first ones, accuses the president to be a Nazi and to have violated human rights in Donbass.Many comments answer to these accusations defending the president.Saying for example: "this is such a massive false equivalence.periodically i bother responding to it.here is my copy-paste nobody ever wants to engage with.non-extensive list examples of ways in which i think it's possible to differentiate the two cases:* zelensky has never used chemical weapons to suppress a revolt against his rule by an ethnic minority, * the us did not execute civilians en mass in any captured town [...]" or: "this is a ludicrous comparison.whilst i don't agree with what the west did in iraq in early 2000's . . ..sadam hussein was committing genocide against the kurds, systematically slaughtering hundreds of thousands of people because of their race/religion.zelensky is not doing this, he is a democratically elected official and ukraine are a peaceful nation.so the idea that we (the west) are not allowed to comment on the russian invasion of ukraine because we've done something similar is lazy, ridiculous and without being rude to you, a tad stupid." Most of those comments are saying that Zelenskyy and Ukraine did not commit atrocities, as affirmed by someone else.But (as it is later explained in the limitation part), many words with negative sentiment like "suppress", "execute", "genocide", "slaughtering", "lazy", "stupid" are used and the context is not interpreted.Having a big sample prevents these context-based exceptions from happening.For this specific day, the sample size is relatively small and is not able to counterbalance this single thread.
Another interesting insight is that there is basically no correlation between the popularity of Zelenskyy and Putin.The Pearson correlation index in fact is -0.03.It could have been possible to hypothesise a negative correlation between the two, maybe connected to the tides of the war.For example, if Russia was making gains Putin's popularity could be increasing, while Zelenskyy's would be decreasing.But this hypothesis is disproven by the evaluated data in the given time period.This could be explained by the fact that it is possible that Putin's popularity would not increase with a successful war, since he has mostly seen as the enemy.
Similar to the Putin vs Zelenskyy analysis above, it can be explored from the Figure 5 that Ukraine scores evidently better than Russia.In fact, the former consistently more than the latter with an average polarity of 0.077, compared to an average of 0.044.In the same fashion of the previous analysis, Russia is cited way more frequently than Ukraine.In fact, the former is cited 137419 times, whilst the letter is 89736.This is found pretty interesting since, despite five of the six analysed subreddits being named after Ukraine, the real focus is Russia.
The two trends seem very similar.In fact, the Pearson correlation is 0.55.This might be because the two countries are very often cited in the same submission, hence presenting identical polarity scores.To solve this issue, two new databases which respectively contained "Ukraine" but not "Russia" and vice versa are created.In this process, 33790 observations for each database were dropped, removing more than one third of the original "Ukraine" database.
The new numbers highlight even more focus on Russia, who now counts almost double the number of citations than Ukraine, counting 103629 against 55946.The new data shows an increase in the gap between the two countries.In fact, Ukraine, with an average score of 0.09 scores more than double than Russia, which decreases its polarity to 0.04.As expected, also the Pearson correlation index decreases significantly to 0.26, which remains still surprisingly high.

Stock Market Analysis
Four different tickers, regarding four different aspects connected to the war, were chosen: (1) United Kingdom Oil and gas stock price, (2) Ruble -US Dollar exchange rate, (3) Oil price, and (4) Gas Price.In particular, the most influential one is gas prices which have been used as a leverage for a good chunk of the conflict.Many western countries, including but not limited to Italy and Germany, provide weapons and support to Ukraine, but used to rely heavily on Russian gas for their energy needs.Russia has manoeuvred the gas price and supply (for example closing the gas pipeline North Stream One) to try to weaken the support for the Ukrainians and lift the sanctions imposed on them.Furthermore, through the increase of gas price, Russia secured record earnings and export levels.As always, in the stock market, prices are not only a reflection of current demand and offer, but also the projected demand and offer in the future.For all those reasons, we found it interesting to explore if a relationship existed between the hope and fear towards the conflict and the price of gas.
Oil price was also chosen for similar reasons.Oil is another combustible fuel which can be used to produce electricity.If natural gas is to become scarce, it is one of the most likely substitutes for many usages.Furthermore, the quota controlled by Russia is not big enough to allow them to manipulate the prices in the same way they do with gas.Considering that the energy crisis could influence the perception of the conflict for the European public opinion, it is interesting to also explore the oil prices relationship with the proposed hope and fear scores.
One of the very first consequences of western sanctions on Russia, was the fall of the Ruble.Many speculations were done on how this would have affected the Russian economy and their ability to repay their debts.The matter became even more interesting when after it started to climb back, even reaching higher values than pre-conflict period.Since Russia sells a significant part of its gas in Rubles, the swinging of the value of the Ruble are very important to the Russian economy and they are not to be underestimated.The perception of the stability of the country, hence the trust of the market in its currency could be put in jeopardy by losing this war.This is a good reason to expand the study to the exchange rate between US dollar and Russian Ruble.
The United Kingdom has been one of the most supportive countries of Ukraine since the beginning of the war.Differently from Italy and Germany, they are not part of the European Union, and they have rich reserves of natural gas and oil.United Kingdom Oil and Gas is one of the main stocks for the British energy market.It could prove insightful to understand if there is a relationship between hope and fear towards the Ukrainian war and the stock price of a company which acts in a country involved in the war, is influenced by the price of gas and oil, but has access to national stocks and is less dependent on Russia.
We run a linear regression analysis between each of these stock market elements and the proposed hope/fear score.Evaluating the results, we conclude that in terms of p-value there was no significant correlation between hope/fear score and Oil-price, Ruble & US dollar exchange rate, and UK Oil-Gas.
The similar insignificant relationship mentioned above was also obtained between fear score and gas prices.However, in terms of the hope score, a significant relationship was found between hope and the gas price.To interpret the relationship between the hope score and gas prices a linear regression was run, having the average daily hope score as the independent variable and the daily closing price as the dependent one.The regression presents a p-value of 0.018, showing the significance of the model whilst a relatively low R 2 value is obtained as 0.1.Furthermore, the Pearson correlation between the two variables is -0.32.As expected, the correlation is negative, so if hope goes up, the gas prices go down, or vice versa (See Figure 6-(Left)).
We also conducted a research on the relationship between the all stock variables as regressors and the hope/fear score as the target.Considering a significance threshold value of 0.05 for p-value, only the gas and UK Oil-Gas prices returned a significant relationship with the hope score whilst fear score does not provide a significant relationship with any of the regressors.Evaluating the results presented in Figure 6-(Right), we can conclude that a clear relationship between the hope score and two-regressor model (Gas&OKOG) with R 2 value of 0.202 and again with a reciprocal proportion.This analysis means that the public hope for the result of the conflict is not the primary driver for gas and UKOG prices, but there is indeed a relationship to be explored.

Topic Modelling
As is described in the previous sections, we now investigate the Reddit data set in terms of topic modelling.To achieve this goal, we utilised R programming language and many different R external packages are used: • NLP: provides the basic classes and methods for natural processing language and poses as a base for the following packages.
• openNLP: "an interface to the Apache OpenNLP tools (version 1.5.3).The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text written in Java.It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution (The Apache Software Foundation, 2009)." • quanteda: "framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and ngrams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analysing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more (Benoit et al., 2018)." • dplyr: "is a grammar of data manipulation, providing a consistent set of verbs that help to solve the most common data manipulation challenges (Wickham et al., 2022)." • tidytext: "provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages (Silge and Robinson, 2016)." • qdap: "automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse.The package provides parsing tools for preparing transcript data, coding tools and analysis tools for richer understanding of the data Rinker (2020)." • plotly and ggplot2: are packages used for creating graphics for the analysis.
• ggthemes: is a package that enable better aesthetics for graphs.
• wordcloud: is a package that allows the creation of wordcloud-type graphs.
• stm: "The Structural Topic Model (STM) allows researchers to estimate topic models with documentlevel covariates.The package also includes tools for model selection, visualisation, and estimation of topic-covariate regressions Roberts et al. ( 2019)".Structural Topic Modelling (STM) is a topic model method.It is a semi-automatic approach that allows us to incorporate metadata, which represents information about each document, into the topic model.STM aims at discovering topics, estimate their relationship to document metadata and gather information on how the topics are correlated.

Estimating the optimal number of topics
After the corpus is created, the first step is to extract the diagnostics and estimate the optimal number of topics.Whilst estimating the optimal number of topics, our aim is to maximise two important diagnostics of the exclusiveness and coherence whilst keeping likelihood high and residual diagnostics low enough.Due to the fact that having nine topics would ensure that there would be little mixing up between the topics, a little more importance is given to coherence.On the other hand, data would be very hard to interpret and would be difficult to extract useful information from it.
We present the optimal number of topic selection diagnostic results in Figure 7-(a).Examining the Figure 7-(a), we can see that 7 and 8 number of topics appear to be the optimal choices as the result for the likelihood, residual and coherence-exclusiveness analysis.We stick with 7 number of topics as the optimal model since it has lower coherence value compared to 8 topics.Thus, two out of nine topics are discarded and seven is chosen as the topics for this analysis which are: • Topic 1: Geopolitical arguments Examining Figure 7-(c), the quality of the topic is investigated in the same way as before, ideally coherence and exclusivity would be maximised.In this case it is possible to observe that Topic 5 greatly outperformed all the other topics, especially in coherence.This happens because those observations are all in Russian, this makes them very different from the rest.Topics 1 and 3 score very well on their own in terms of Coherence, whilst Topic 2 & 7 are the worst performing ones overall.Topic 6 on the other side is the one that distinguishes itself the most in terms of exclusiveness, despite having a relatively low semantic coherence.The distribution of the topics is analysed in Figure 7-(d).Topic 3 is the most prominent topic, describing around 20% of the database.Topics 5 and 7 are the less popular ones, scoring around 10% each.Considering the correlation analysis plot in Figure 7-(b), we can clearly conclude that there appears to be no correlation between any of the topics.

Results
Topic

Topic 1: Geopolitical arguments
In Table 2, linear regression modelling results of each topic with hope and fear scores are presented.It can bee seen that Topic 1 is positively correlated to both hope and fear.In addition, as shown Fig 8 , Topic 1 is mostly about geopolitical argumentation.The most used words are "Ukraine", "Russia" and "will", showing speculation about the conflict.Other popular words are "NATO", "china", "Germany", "support" and "sanctions", a sign of how the broader picture is also depicted in the conversation.Furthermore, "weapons", "soldiers", "nuclear" are also present, demonstrating an attention to battles.
The correlation to both hope and fear could be explained by the word "will".If future possibilities are explored, they might be about positive events, hence increasing the hope score, or about scary ones, hence increasing the fear score.

Topic 2: Russia and government
Topic 2 is negatively correlated to both hope and fear.Topic 2 seems to be negative opinions about the Russians and governments.There are many words which refer to them as "Putin", "Russian", "Russians", "government", "left" and "right".Other popular words are "f***", "bad", "wrong", "f***ing", "old" and "stop".It is not very clear due to the low internal coherence of this topic.

Topic 3: Morality of war
Topic 3 is negatively correlated to both hope and fear.Topic 3 seems to be about the moral consequences of the war.Investigating randomly taken submissions as examples shows us that the community discusses about (1) the morality of dealing economically with the side of the war, (2) the consequences positive of globalisation, and (3) the idea of leaving internal civic debates in Ukraine for later, while doing common front now against the common foe.
Being these moral considerations, they are not relevant with hope and fear, for this reason it is naturally considerable that they might score low in both.

Topic 4: War atrocities
Topic 4 is positively correlated with hope, but negatively with fear.Topic 4 is about war atrocities and their devastating effects.Unexpectedly, for this topic we obtained a positive correlation with hope and a negative one with fear.

Topic 5: Submissions in Russian
Topic 5 is negatively correlated with hope, but positively with fear.Topic 5 is composed by the submissions in Russian language.It is negatively correlated to hope since there are no Russian words in the "hope" dictionary.It is probably negatively correlated to fear because the few English words are present in the Fear dictionary (similar to the case in the third example).

Topic 6: Foreign Submissions
Topic 6 is negatively correlated to hope but positively correlated to fear.Similarly to the Topic 5, Topic 6 is mainly composed of submissions in foreign languages.Most of them will score 0 since their words will not be present in either dictionary.Potentially some similar common words in foreign languages with English created a positive correlation with Fear.

Topic 7: Weapons
Topic 7 is positively correlated with hope, but negatively with fear.Topic 7 is about weapons.Many of the words shown reflect that: "tanks", "artillery", "weapon", "missiles", "gun", "range", "modern", "expensive", "drone".Others also regard the military in a broader sense, like "logistic", "training" and "equipment".Finally, "good" is the most used word in the topic.This explain that the superior Ukrainian equipment reassures the public and increases their hope.

DISCUSSIONS
In Ukraine, many geopolitical themes are unfolding, and many interests are conflicting.Considering how high the stakes are, it is imperative for a good politician to use every tool at his disposal to direct the public opinion where it is most needed.
Currently during the winter period in northern hemisphere, the stakes will be particularly high, since the electricity gas demand will be particularly high, and likely gas price will be a strong weapon for Russia.Increased gas prices will have direct and indirect effects on prices for the public.Heating prices would go up significantly.The indirect effect would come from manufactured goods.In fact, with the increase in the electrical bill, would also come an increase of the price of their products.As for the analysed period, support towards Ukraine and Zelenskyy is still strong.The real test will be during the winter, when the western average Joe might be strongly affected by the consequences of the war, sometimes not even being able to afford heating and food.At this point it is possible that the public might ask to end the war at any cost.This might cause the end of European support in weapons and logistics, which would generate huge difficulties for Ukraine.
If the world of politics wants to keep support for the war, it might employ two strategies.The first one is to try to increase the hope of the people towards a Ukrainian victory.To do so, they should talk about the superiority of the weapons provided by the west, how good and effective they are compared to soviet era Russian ones.This topic in fact was positively correlated to hope and negatively correlated to fear.
The second one, would be to try to instate fear towards Russia.To do so, politics and news could start using those geopolitical arguments that prove the dangers of the country.This might be a double-edged strategy.In fact, it could undermine the faith in victory of the public and jeopardize overall morale.If this would happen and people would start to see Russia as an unstoppable danger, they could ask for a fast end of the conflict, since defeat would be seen as inevitable.
Another thing that would be deleterious are periods of excessive stagnation.They might cause a decrease in the interest, which coupled with possible severe economic consequences, might frustrate the public.The risk is that they would see his life worsened in exchange for no visible progress.

CONCLUSIONS
The results of this study can be seen as the development of a way to measure hope via exploiting social media posts of the public all over the world, and an insightful overview over the public opinion on the Russo-Ukrainian conflict, focused predominantly on hope.
The first analysis regards the interest towards the conflict.A steady decline in the number of submissions is observed, while the average number of upvotes for the posts does not increase or decrease.This shows a relative loss of interest, due to the stagnation of the news.In fact, the analysis takes place mostly during the "phase two" of the war, characterised by a slow but certain Russian advance.On the other side, the average number of upvotes remains constant, demonstrating that the potential interest is still present.The public is still there, it just needs something new to get engaged with and participate more actively again.
The second analysis is about hope.Following the events of the war, hope strongly decreases after the symbolic and strategical losses of Azovstal (Mariupol) and Severodonetsk.After that, it stabilises in its slow decrease, mirroring the tides of phase two of the conflict.Spikes in hope, both positives and negatives, are present after important battles, but also some non-military events, such as Eurovision and football games.This is an interesting insight, because it shows how morale is not only formed by the objective results of the war, but also by emotional events.
The third one regards fear.Its trend is stable during the entire analysis.Meaning that the tides of the war itself did not influence it significantly.There is a minor negative correlation with hope.It is interesting to notice that they are not inversely correlated.This means that hope and fear could coexist in the public opinion in specific instances.
The fourth one analyses the popularity of the two countries and their leaders, using a polarity score.The most obvious consideration is that Zelenskyy and Ukraine constantly outperform Putin and Russia.Despite being relatively volatile, the trend seems to remain constant.A key takeaway from this is that a strong opinion is formed, and without serious upheavals, it will not change.
In the fifth one, the relationship between fear/hope and relevant financial items is explored.A significant relationship (which is negative) between hope and the gas price was found.With the increase of hope, gas prices would decrease, or vice-versa.A reason for that could be that there is hope that a Ukrainian victory in the war would put ease again the gas flow from Russia to Europe.Since this has been selected as a fundamental analysis via limited amount of information, more studies would need to be done to fully explore this relationship.
The sixth one is the topic modelling.The submissions in English language are about five different topics: geopolitical arguments, Russia and government, morality of war, war atrocities and weapons.Those are the topics which caught the public eye the most in the analysed period.Geopolitical arguments are positively correlated with both hope and fear.Morality of war, Russia and government are negatively correlated with both hope and fear.Discussions about weapons are positively related to hope and negatively to fear, and surprisingly the same applies to war atrocities.

•
text: the actual content of the submission • upvotes • author • date • id: the unique submission id • flair: categorisation of the post by the author • type: post or comment • parent id • subreddit

Figure 2 .
Figure 2. Number of submissions & daily average number of upvotes over the time

Figure 3 .
Figure 3. Running means for the proposed hope and fear scores

Figure 4 .
Figure 4. Deviation from the average hope.

Figure 5 .
Figure 5. (LEFT) Polarity score for the two leaders.(RIGHT) Polarity score for the two countries.MA graphs for each figure refer to 7-days moving average of the original data.

Figure 6 .
Figure 6.(Left) Scatterplot showing the gas price and the hope score.In red, the regression line.(Right) 3D-scatter plot of 2-regressor model fit.

Figure 7 .
Figure 7. (a -Top Left) Model Selection results with four distinct diagnostics.Sizes of each marker relate to the residual diagnostic values.(b -Top Right) Exclusivity and coherence for the individual topics.(c -Bottom Left) Topic proportions in the dataset.(d -Bottom Right) Correlation between topics.

Table 1 .
Descriptive statitics for the whole analysis

Table 2 .
Topic Modelling Analysis Results