AUTHOR=Cheatham Susan , Kummervold Per E. , Parisi Lorenza , Lanfranchi Barbara , Croci Ileana , Comunello Francesca , Rota Maria Cristina , Filia Antonietta , Tozzi Alberto Eugenio , Rizzo Caterina , Gesualdo Francesco TITLE=Understanding the vaccine stance of Italian tweets and addressing language changes through the COVID-19 pandemic: Development and validation of a machine learning model JOURNAL=Frontiers in Public Health VOLUME=Volume 10 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2022.948880 DOI=10.3389/fpubh.2022.948880 ISSN=2296-2565 ABSTRACT=Social media is increasingly being used to express opinions and attitudes towards vaccines. The vaccine stance of social media posts can be classified in almost real-time using machine learning. We describe the use of a Transformer-based machine learning model for analysing vaccine stance of Italian tweets, and demonstrate the need to address changes over time in vaccine-related language, through periodic model retraining. Vaccine-related tweets were collected through a platform developed for the European Joint Action on Vaccination. Two datasets were collected, the first between November 2019 and June 2020, the second from April to September 2021. The tweets were manually categorised by three independent annotators. After cleaning, the total dataset consisted of 1736 tweets with 3 categories (promotional, neutral, discouraging). The manually classified tweets were used to train and test various machine learning models. The model that classified the data most similarly to humans was XLM-Roberta-large, a multilingual version of the Transformer-based model RoBERTa. The model hyper-parameters were tuned and then the model ran five times. The fine-tuned model with the best F-score over the validation dataset was selected. Fine-tuning and testing the model on just the first dataset resulted in an accuracy of 72.8% (F-score 0.713). Using this model on the second dataset resulted in a 10% drop in accuracy to 62.1% (F-score 0.617), indicating that the model recognised a difference in language between the datasets. On the combined datasets the accuracy was 70.1% (F-score 0.689). Retraining the model using data from the first and second dataset increased the accuracy over the second dataset to 71.3% (F-score 0.713), a 9% improvement from when using just the first dataset for training. The accuracy over the first dataset remained the same at 72.8% (F-score 0.721). The accuracy over the combined datasets was then 72.4% (F-score 0.720), a 2% improvement. Through fine-tuning a machine-learning model on task-specific data, the accuracy achieved in categorising tweets was close to that expected by a single human annotator. Regular training of machine-learning models with recent data is advisable to maximimise accuracy.