ORIGINAL RESEARCH article
Front. Artif. Intell.
Sec. Natural Language Processing
Volume 8 - 2025 | doi: 10.3389/frai.2025.1623090
CoViNAR: A Context-Aware Social Media Dataset for Pandemic Severity Level Prediction and Analysis
Provisionally accepted- 1Jamia Millia Islamia, Delhi, India
- 2Prince sultan university, riyadh, Saudi Arabia
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
The unprecedented COVID-19 pandemic exposed critical weaknesses in global health management, particularly in resource allocation and demand forecasting. This work introduces a transformative approach to enhancing pandemic preparedness through real-time social media analysis. Using SnScrape, over 27.5 million tweets for the duration of November 2019 to March 2023 were collected using COVID-19-related hashtags. Tweets from April 2021, a peak pandemic period, were selected to create the CoViNAR dataset. BERTopic enabled context-aware filtering, resulting in a novel dataset of 14,000 annotated tweets categorized as "Need", "Availability", and "Not-relevant." The CoViNAR dataset was used to train various machine learning classifiers, and the best classifier achieved an accuracy of 96.42%, 96.44% precision, 96.42% recall, and an F1-score of 96.43% on the Test dataset. While training the NAR classifier, we experimented with three context-aware word embedding techniques, with DistilBERT yielding the best performance. We demonstrated the success of the NAR classifier by performing a temporal analysis of tweets from the US, UK, and India from November 2019 to March 2023. The strong correlation between NAR tweet counts and COVID-19 case surges highlighted the potential of the proposed method, offering health authorities a powerful, proactive tool for resource management during a pandemic.
Keywords: BERTopic, COVID-19, Natural Language Processing, Social Media, DistilBERT, SVM
Received: 05 May 2025; Accepted: 31 Jul 2025.
Copyright: © 2025 Shafiya, Wani, Jabin, ELAffendi and Jahiruddin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Soofi Shafiya, Jamia Millia Islamia, Delhi, India
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.