AUTHOR=Shafiya Soofi , Wani Mudasir Ahmad , Jabin Suraiya , ELAffendi Mohammad , Jahiruddin TITLE=CoViNAR: a context-aware social media dataset for pandemic severity level prediction and analysis JOURNAL=Frontiers in Artificial Intelligence VOLUME=Volume 8 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2025.1623090 DOI=10.3389/frai.2025.1623090 ISSN=2624-8212 ABSTRACT=IntroductionThe unprecedented COVID-19 pandemic exposed critical weaknesses in global health management, particularly in resource allocation and demand forecasting. This study aims to enhance pandemic preparedness by leveraging real-time social media analysis to detect and monitor resource needs.MethodsUsing SnScrape, over 27.5 million tweets for the duration of November 2019 to March 2023 were collected using COVID-19-related hashtags. Tweets from April 2021, a peak pandemic period, were selected to create the CoViNAR dataset. BERTopic enabled context-aware filtering, resulting in a novel dataset of 14,000 annotated tweets categorized as “Need”, “Availability”, and “Not-relevant”. The CoViNAR dataset was used to train various machine learning classifiers, with experiments conducted using three context-aware word embedding techniques.ResultsThe best classifier, trained with DistilBERT embeddings, achieved an accuracy of 96.42%, 96.44% precision, 96.42% recall, and an F1-score of 96.43% on the Test dataset. Temporal analysis of classified tweets from the US, UK, and India between November 2019 and March 2023 revealed a strong correlation between “Need/Availability” tweet counts and COVID-19 case surges.DiscussionThe results demonstrate the effectiveness of the proposed approach in capturing real-time indicators of resource shortages and availability. The strong correlation with case surges underscores its potential as a proactive tool for public health authorities, enabling improved resource allocation and early crisis intervention during pandemics.