On the use of sentiment analysis for linguistics research. Observations on sentiment polarity and the use of the progressive in Italian

This article offers a conceptual and methodological contribution to linguistics by exploring the potential value of using sentiment analysis (SA) for research in this field. Firstly, it discusses the limitations and advantages of using SA for linguistics research including the wider epistemological implications of its application outside of its original conception as a product reviews analysis tool. Methodologically, it tests its applicability against an established linguistic case: the correlation between subjective attitudes such as surprise, irritation and discontent and the use of the progressive. The language example is Italian for which this function of the progressive form has not been analyzed yet. The analysis applies FEEL-IT, a state-of-the-art transformer-based machine learning model for emotion and sentiment classification in Italian on language samples from various sources as collected in Evalita-2014 (238,556 words). The results show statistically significant correlations between negative subjective attitudes and the use of the progressive in line with previous accounts in other languages. The article concludes with a few additional propositions for practitioners and researchers using SA.


Introduction
This article offers a conceptual and methodological contribution to linguistics by exploring the potential value of using sentiment analysis (SA) for research in this field.SA is a computational technique that allows the analyst to annotate language material for attitudes-either sentiment or opinions (Liu, 2020).It was first developed within natural language processing (NLP) studies as a sub-field of Information Retrieval (IR).Its original application was devised for inferring general opinions from online product reviews, a need that emerged from the explosion of e-commerce and social media of the early 2000's.
As online product reviews started to overwhelm commercial companies with marketing material, previous methods such as surveys and focus groups quickly became obsolete; using SA was not only much faster but also infinitely cheaper.Since then, SA has been increasingly used in other domains of society such as in the health sector and by government agencies and for other tasks, including making stock market predictions and analyzing citizens' opinions or concerns (Liu, 2020).The aim of this paper is to investigate the limitations and advantages of the SA method for linguistics research, its applicability for tasks outside of its original conception, and to what extent it may be used in linguistics in which its use has mostly been adopted as a research method to enhance discourse analysis studies (Maite, 2016).
As a first distinctive feature, this study focuses on functional grammar (Dik, 1978(Dik, , 1989)), that is the study of grammar within its social, interactional, and cultural context (the so-called "functional paradigm").Specifically, it tests the use of SA against an established case: the correlation between subjective attitudes such as surprise, irritation and discontent and the use of the progressive.Several accounts in various languages (e.g., English, German, and Dutch; Traugott and Dasher, 2001;Killie, 2004;Levin, 2013;Pfaff et al., 2013;Anthonissen et al., 2016Anthonissen et al., , 2019) ) have indeed observed that non-aspectual pragmatic or subjective meanings, for example to signal politeness and/or discontent, may mark the use of progressive constructions.In this way, this article directly answers recent calls in linguistics for new analytical methods that could expand the field by devising novel and dynamic ways of exploring established topics (Rose and McKinley, 2020).However, with the exception of limited work in French (e.g., De Wit et al., 2020), studies have largely focused on Germanic languages and predominantly on English.As a second distinctive feature, here the use case is the Italian language.In this way, the study provides fresh insights into this line of enquiry by adding novel findings that could strengthen previous observations in a non-Germanic language.
As a third contribution, the article provides a conceptual analysis of SA as a research method used in domains outside of its original application and to answer research questions different from its original conception.It will be argued here that if on the one hand major technological advances, the explosion of data availability, and the increasingly mobile and multilingual world have called for new research designs, data collection techniques, and tools for analysis (e.g., Rose and McKinley, 2020), on the other such novel resources and methods require thorough understanding of the assumptions behind them, constant update, and critical supervision.For example, SA has been used in a variety of fields such as psychology (Salas-Zárate et al., 2017;Liu, 2022;Zhong and Ren, 2022), political science (Haselmayer and Jenny, 2017;Ansari et al., 2020;Matalon et al., 2021), social science (Karamibekr and Ghorbani, 2012;Bhat et al., 2020;Nguyen et al., 2020), digital humanities (Moreno-Ortiz, 2017;Moreno-Ortiz et al., 2020;Schmidt et al., 2021;Viola, 2022), and media studies (Burscher et al., 2016;Thelwall, 2016;Amarasekara and Grant, 2019) to analyze opinions about social and political issues, particularly on social media that is for tasks other than product review analysis.Hence, whilst recognizing the potential benefits of quantitative methods such as SA, the discussion of the limitations and advantages of using SA for linguistics research will contribute reflections on the wider epistemological implications of its application outside of product reviews analysis.
For a recent overview of sentiment analysis please see Lee and Lau ( ).

Academic discussion
The explosion of digital material of the last two decades and the subsequent need to analyze it and interpret it paired with advances in technology and statistical theory have greatly impacted the way information is retrieved today (Viola, 2023).Disciplines across scientific areas have increasingly incorporated technology within their traditional workflows and developed sophisticated data-driven approaches to analyze ever larger and more complex datasets.As a result, computational methods such as SA are used more and more in domains outside of NLP and for tasks that are very different from their initial application (Drucker, 2020;Viola, 2022).
SA for example was originally conceived as a tool to maximize profits; through large-scale analyses of online product reviews, the goal behind the technique was to optimize marketing strategies.
Being SA first and foremost and economic instrument, several authors have pointed out how the application of this technique in domains different from its original conception and for tasks other than product reviews analysis poses several challenges, methodological but also epistemological (Pang and Lee, 2008;González-Bailón and Paltoglou, 2015;Puschmann and Powell, 2018;Viola, 2022Viola, , 2023)).The main criticism addresses the critical issue that whereas SA will perform satisfactorily when rating opinions about products and services, opinions about social and political issues will likely be misclassified (Pang and Lee, 2008;Lee and Lau, 2020).This is because SA's algorithms lack sufficient background knowledge of the local social and political contexts, not to mention the much higher linguistic and cultural complexity of these types of texts compared to product reviews (e.g., sarcasm, puns, plays on words, and ironies; Liu, 2020).These scholars have therefore argued that this limitation makes the use of SA for empirical social research rather controversial, particularly when the method is borrowed uncritically by other disciplines or when the technique is embedded in a range of algorithmic decision-making systems (Karppi and Crawford, 2016;Puschmann and Powell, 2018;Viola, 2022Viola, , 2023)).
Despite voices of skepticism around the technique, SA has shown no signs of decline over the years.To the contrary, computer science efforts have been increasingly devoted toward improving and refining the method, for instance by moving from dictionarybased approaches mostly context-agnostic to transformer-based machine learning models which go beyond single word predictions (see for instance Hassan and Mahmood, 2017;Shen et al., 2017;Meena et al., 2021Meena et al., , 2023;;Ahmed et al., 2022;Li et al., 2022).Such continuing advances have certainly contributed to its growing adoption for empirical social research.More recently, for example, the application of SA to social media content has been used to analyze trends in public opinions and reactions to global concerns such as anxiety and stress in relation to diseases and health crises (e.g., Praveen et al., 2021;Jahanbin et al., 2022;Ogbuokiri et al., 2022;Meena et al., 2023).
This article discusses the gained prominence of SA within the wider context of two mutually reinforcing factors.First, the larger incorporation of technology in all sectors of society, naturally including also the creation of academic knowledge and second, the common misconception that computational outputs are objective and reliable, adding to the allure of the technique.It is argued here that such idealization of SA likely forms epistemological expectations that the method will inevitably disappoint.In their analysis of the perception of SA in public discourse, Puschmann and Powell (2018) for example highlight that the comforting illusion of objectivity and precision in relation to SA creates an expectation of validity and accuracy that is misaligned with the technique's original function (2), that is to only provide an approximation of human judgement (10).
Naturally, this misconception does not solely concern SA but all computational methods more widely as it pertains to the notion of discrete vs. continuous modeling of information (Calude and Longo, 2017;Longo, 2019;Viola, 2023).In discrete systems, information is rendered as exact and separate pointssequences of 0 and 1s-and something belongs to either one category or another.For example, a SA task is usually modeled as a classification problem, that is a classifier processes pre-defined elements in a text (e.g., sentences) and it returns a pre-set category (e.g., positive, negative, or neutral).The resulted discrete output produces an illusion of accuracy; it is argued here however that it is the process itself of discretizing emotions that poses several challenges, including the assumption that it is possible to not only disambiguate subjectivity, but also to quantify attitudes and even attribute them scores.
Over the years computer scientists have attempted to develop more sophisticated SA methods that aimed to alleviate issues such as reducing the subjective perception of emotions to two/three (unproblematized) categories and identifying the multiple elements that different types of sentiment may refer to in the same text.For example, through so-called "fine-grained classifiers, " the algorithm extracts information to discriminate aspects and opinion targets (Schouten and Frasincar, 2015;Pontiki et al., 2016) and provides a slightly less rigid classification output (e.g., very positive, positive, neutral, negative, and very negative) thus adding a more nuanced distinction of the identified sentiment (e.g., Munikar et al., 2019).Other classifiers return a prediction of the corresponding sentiment (e.g., anger, happiness, and sadness; e.g., Kawade and Oza, 2017); others yet try to quantify how much emotional content is present within the document (i.e., sentiment magnitude; e.g., Jini and Prabu, 2019).To overcome the issue of being context agnostic, attempts have also been made to incorporate external linguistic content such as historical data into the model (Xiao et al., 2022).
From a linguistic and epistemological point of view, however, it is the intrinsic conceptualization of emotions as quantifiable, discrete, fixed, and objective entities that raises doubts about the legitimacy of the technique, particularly for empirical social research.Viola (2023, p. 72) argues: [. . .] we are told that SA is a quantitative method that provides us with a picture of opinionated trends in large amounts of material otherwise impossible to map.In reality, the reduction of something as idiosyncratic as the definition of human emotions to two/three categories is highly problematic as it hides the whole set of assumptions behind the very establishment of such categories.For example, it remains unclear what is meant by neutral, positive, or negative as these labels are typically presented as a given, as if these were unambiguous categories universally accepted (Puschmann and Powell, 2018-quoted in the original).
Indeed, the classification of linguistic categories is a well-known linguistic problem.Langacker (1983) stated that the subjective, processual, and context-bound nature of language in use prevents linguistic categories from being unequivocally defined (see also Talmy, 2000;Croft and Cruse, 2004;Dancygier and Sweetser, 2012;Gärdenfors, 2014;Paradis, 2015).In challenging tasks such as manually annotating language material, this issue becomes particularly apparent.Indeed, when several human annotators are asked to annotate language material, there is always an expectation of disagreement on same annotation decisions.This expectation is known as "inter-annotator agreement, " a measure that calculates the degree of agreement between the annotators' decisions about a label and is meant to function as a warning to the analyst before drawing any linguistic conclusion uniquely based on manually annotated language material.The inter-annotator agreement measure may vary greatly as it factors several parameters into the calculation (e.g., number of annotators, number of categories, and type of text) but in general, it is never expected to be 100%.Especially when the annotation concerns highly subjective linguistic elements whose interpretation is inseparable from the annotators' culture, personal experiences, values, and beliefs-such as the perception of sentiment-this percentage has been found to remain at 60-65% at best (Bobicev andSokolova, 2017, 2018).
Consequently, it should further be noted that any conclusion based on SA output will inevitably include an additional degree of inconsistency between the way the categories of positive, negative, and neutral emotion have been assigned in the model and the analyzed material to which the model is applied.For this reason, applying a SA model across different textual genres is not advisable as the sentiment will likely be misclassified.
The academic discussion has highlighted the complexities and ambiguities of SA as a method for empirical social research.Particularly when used for tasks other than its original design, this article argues that the positivist hypes brought about by the digital transformation of society and the increasing incorporation of computational methods into academic knowledge production should not obfuscate the promise of SA, that is to function as an approximation of-and not a substitute for-human judgement.It is within this academic discussion that the article tests the potential value of using SA in linguistics; in doing so, it attempts to answer the following research questions: (1) to what extent is SA a suitable method for linguistics research?To answer this question, the article applies SA to an existing case in functional grammar, the subjectification of the progressive form.This leads to the second research question: (2) can SA be used to find correlations between subjective attitudes and the use of the progressive in Italian?

Subjectification and the progressive form
The study of the progressive construction has fascinated linguists for a long time; this is probably due to the various functions it performs as well as its continuing changing nature.As it escapes a single-level taxonomy, over the years several mappings have been suggested across languages, mostly based on the progressive's internal characteristics of aspectuality, imperfectivity, and incompleteness.More recently, however, the Viola .
progressive has started to be explored through the lens of pragmatics and discourse; these observations suggest that this chameleonic form can also function as a marker of nonaspectual pragmatic or subjective meanings, for example to signal surprise, politeness, irritation, and discontent (Traugott and Dasher, 2001;Killie, 2004;Levin, 2013;Pfaff et al., 2013;Anthonissen et al., 2016Anthonissen et al., , 2019;;Freund, 2016;Martínez-Vázquez, 2018).In other words, alongside other linguistic structures such as epistemic modality (e.g., He could have done better ;Traugott, 1989Traugott, , 1995) ) and English sentence adverbs (e.g., Clearly, you know what you're doing; Swan andBreivik, 1997, 2011), progressive constructions may also be examples of subjectification, defined as the "semantic-pragmatic process whereby 'meanings become increasingly based in the speaker's subjective belief state/attitude toward the proposition, ' in other words, toward what the speaker is talking about" (Traugott, 1995, p. 31).
Authors have advanced various hypotheses that may explain this phenomenon.Scheffer (1975) for example suggests that in English, the progressive may be used subjectively by the speaker with reference to the temporal aspect of the verb, for example to emphasize the excessive duration of an event and therefore to manifest irritation.In addition to this aspect of temporality, Bland (1988) and Smitterberg (2005) also argue that the English progressive may equally signal subjectivity when the intention is to emphasize the intensity of an emotion or a situation (e.g., I'm loving this).Other accounts in French also show similar results.In their analysis of modality and aspect in the progressive, De Wit and Patard (2013) for example found that the progressive can also be used to signal irritation and surprise but also to express hedge in reference to an event.More recently, in a comparative study on language samples in English, Dutch and French, De Wit et al. (2020) argued that present progressive constructions are particularly liable to be used for signaling that something about the ongoing situation is unconventional, what they call "extravagant language."Although findings on the use of the progressive to mark subjective meanings have been validated independently by several studies on different datasets, the majority of these observations come from Germanic languages and besides the few already mentioned exceptions in French, this function has not been explored in other Roman languages yet.By investigating the subjective dimension of the progressive in a typologically diverse language like Italian, this study aims to contribute novel findings to this area of investigation, expanding the field and contributing new details to existing theories and methods.

Methodology and dataset
This article investigates the applicability of SA as an experimental method for linguistics research.As a methodological and conceptual contribution, the article aims to explore new analytical paradigms and tools that may help linguists to meet the challenge of analyzing ever larger and more complex language material.For the analysis, the study uses FEEL-IT (Bianchi et al., 2021), a state-of-the-art transformer-based machine learning Another challenge of using computational methods for linguistic research is related to the predominance of models, datasets, and tools devised and developed for the English language.Even when models and resources in other languages exist, they are often proprietary and expensive (e.g., Google Cloud Platform Console) as well as not transparent often offering only opaque documentation.Because not all languages are equally resourced digitally and computationally, researchers, teachers and curators are forced to compromise on which tasks can be performed, with which tools and through which platforms (Viola, 2023).Such Anglophone-centricity is therefore often still a barrier for researchers working with languages other than English (Viola and Fiscarelli, 2021;Viola, 2023;Viola et al., 2023).Resources like FEEL-IT are important tools to counterbalance the predominance of English in computer science.
The dataset for the analysis is the Italian language data used in the EVALITA Parsing Task project.This is a periodic evaluation campaign of Natural Language Processing (NLP) and speech tools for the Italian language.The aim of the EVALITA project is to promote the development of language and speech technologies for the Italian language and to offer to the scientific community a shared framework where different systems and approaches can be evaluated consistently.The corpus used here is the training and test data corpora used for the 2014 EVALITA task.It contains language samples from various sources (e.g., legal texts, news articles, and After launching the SA model on the dataset, random excerpts are analyzed qualitatively to assess whether the assigned prediction can be reputed reliable.Specifically, the analysis uses the language data provided by ISDT (Italian Stanford Dependency Treebank) released for the dependency parsing shared task of Evalita-2014 (Bosco et al., 2014).
The analysis proceeds at two levels, quantitative and qualitative.First, the quantitative analysis aims to find statistically significant correlations between sentiment polarity and the use of the progressive; second, the qualitative analysis aims to assess to what extent the sentiment and emotion classification can be reputed reliable.In the first stage, the language material will first be tokenized into sentences and filtered for verbs at the progressive form.This will allow us to work on a subset containing only the relevant verbal constructions thus making the SA processing faster and more efficient.The SA output will then be tested for significance.In the second phase, random excerpts will be qualitatively analyzed against the classified emotion and polarity.

FIGURE
Distribution of sentiment in the subset corpus.

Analysis and results
The relevant subset contains 628 sentences containing 377 verbs at the gerund/progressive form of which 382 from the first declination (verbs ending in -are) and 248 from the second declination (verbs ending in -ere; none from the third).The subset was then queried for sentiment using the FEEL-IT Python library.The results showed that the majority of the sentences in the subset (469 vs. 159) were classified as negative (Figure 1); Figure 2 reports the distribution of the identified corresponding emotion.
Following the quantitative analysis, the results were tested for significance; Table 2 reports the results of the chi-square test (also displayed in Figure 3).
The Chi-square results align with previous findings in other languages showing that statistically significant correlations between negative sentiment and the use of gerund/progressive constructions can be observed.However, even though the correlation is significant, Cramer's phi reports that the strength of the relationship is not high.This finding could be explained by several factors including the diversity of the sources in the dataset, the different genres, the fact that SA is applied to short sentences and therefore that very limited context is provided.Future investigations could for example perform logistic regression to identify other potential factors that are predictive of sentiment (e.g., type of sources, type of verb, and topics).

Qualitative analysis
This section presents the qualitative analysis of random excerpts ; the aim is to assess to what extent the sentiment and emotion classification can be reputed reliable (bold added).
The truth is that now health costs are exploding for reasons that have nothing to do with service improvements and that not even politicians can explain.
The sentence in (1) was classified as negative and the corresponding emotion was classified as anger.Although the identified emotion may be correct to an extent, the attributed classification (i.e., negative) may not be aligned with the writer's original intentions.Indeed, one often unclear element of SA is the clarification of whether the algorithm detects the attitude of the writer or the expressed polarity in the analyzed textual fragment (Puschmann and Powell, 2018;Viola, 2023).This is a known limitation of classification systems such as SA that are based on rigid categories (e.g., positive/negative, anger = negative).If taken individually the issue may be negligible, in the aggregate misclassifications like this one are amplified by the scale of the analyzed material, which can run in the order of millions of sentences.It is true that thanks to the additional provision of the corresponding The sentence in (2) was classified as negative and the corresponding emotion was classified as anger.At the time when this material was collected (2011-2014), President Sali Berisha was Prime Minister of Albania.His government was the center of long controversies and protests by the socialist party which accused him of power abuse, corruption and human rights violation.Once again, the emotion seems to have been correctly classified but because in the model, anger has been collapsed into negative sentiment polarity, regardless of the context the sentence is classified as negative.Again, analysts using SA, particularly for empirical social science research, should be aware that there is always a degree of inconsistency between the way the categories of positive/negative have been defined in the training model and the writer's intention in the actual material to which the model is applied.

Discussion
The results showed statistically significant correlations between negative subjective attitudes such as anger and sadness and the use of the progressive, in line with previous accounts in English, German, French, and Dutch.A close inspection of random excerpts highlighted some of the uncertainties of using this technique in fields other than IR and for tasks other than product review analysis, such as for empirical social research.For example, it was found a potential degree of discrepancy between the writer's intention, the way the classification was carried out for training the SA model, and the text to which the model was applied.The issue is related to the reduction of opinions and sentiment to two/three categories, conceptually problematic when applied to social data.At the same time, the provision of a more nuanced classification system such as the prediction of the corresponding emotion may contribute to alleviate this issue.It is therefore recommended that analysts prefer finer-grained classifiers over those providing solely a basic identification of sentiment, as this may introduce errors and biases in the final output.
It is argued here that these uncertainties add further complexity and ambiguity to the already existing limitations of the technique and that such complexities and limitations should be assessed carefully by researchers and practitioners before and during the analysis.

Future studies
Whereas the correlation between negative sentiment and the progressive was found statistically significant, the strength of the relationship was not high.Future research could perform regression analysis to explore other predictors of sentiment polarity in connection with the progressive.Moreover, this line of enquiry would benefit from further experiments using manually annotated Italian data for sentiment analysis, emotion identification/recognition tasks.Finally, larger datasets could be used in future research, including in other languages where this relationship has already been attested (i.e., English, German, Dutch, and French).This could either further validate the suitability of SA for this line of research or discard it.

Conclusion
This article provided a conceptual and methodological contribution to linguistics by investigating the applicability of SA for research in this field with a focus on functional grammar.The aim was to add to the scholarly conversation by testing the potential value of this computational method to navigate the complexities of large datasets of language material.In doing so, the study answered to recent calls in the linguistics field for incorporating more advanced techniques than traditional Corpus Linguistics approaches.The analysis tested the applicability of SA against an established linguistic case-the subjective use of the progressive form-and it took Italian as a case study.The language choice was motivated by the fact that while the subjective function of the progressive has been observed in Germanic languages and in French, there are no study empirically investigating this function in Italian.Thus, the results of this study contributed fresh findings to this body of work.
The article also discussed the limitations and advantages of using SA for linguistics research and for empirical social research more widely and examined specifically the larger epistemological implications of applying this method for tasks outside of its original conception.As language repositories become ever larger and social media texts become more and more the preferred lens through which social scientists and linguists conduct their investigations, traditional quantitative approaches may not fully capture the complexities of digital communication material such as those brought about by the increasing volume of available data.Traditional methods alone, including for example words' collocation analyses, may no longer be sufficient for identifying otherwise not immediately evident patterns and discontinuities.More sophisticated methods such as SA can therefore be of great assistance to linguists who are now increasingly confronted with the challenge of analyzing the complexities of the textual material produced and fairly easily available.However, although undoubtedly still providing powerful means to navigate large quantities of language material, these methods should not be adopted uncritically.Researchers and practitioners using SA should therefore use this technique as an exploratory method, particularly when applied to empirical social research and always resize their epistemological expectations accordingly.At the very least, an acknowledgment of such complexities should be present when using this technique.

FIGUREFrontiers
FIGUREDistribution of the identified emotion in the subset corpus.

FIGUREFrontiers
FIGUREVisualization of Chi-square results.The red section represents the p-value.
TABLE Results of emotion recognition models trained on FEEL-IT.
e., anger, joy, sadness, fear).The sentiment polarity is obtained by collapsing the four emotions into the two categories (i.e., joy → positive; fear, anger, sadness → negative).Table1displays the results of emotion recognition models trained on FEEL-IT and tested on two other datasets: MultiEmotions-It (ME; Sprugnoli, 2021) and a dataset of 662 tweets about COVID-19 (C-19) against the Most Frequent Class (MFC) for baseline results.As the table shows, the model is stable and accuracy is acceptable.

TABLE Chi -
square results for the correlation between gerund/progressive forms and sentiment polarity.
Translations by the author.emotion,more nuanced systems like FEEL-IT partially overcome this limitation; analysts and researchers should however carefully assess to what extent they should rely on the SA's output and always consider it as an approximation of human judgement.