AUTHOR=Memon Atia Bano , Sootahar Dileep Kumar , Luhana Kirshan Kumar , Meyer Kyrill TITLE=A corpus-based real-time text classification and tagging approach for social data JOURNAL=Frontiers in Computer Science VOLUME=Volume 6 - 2024 YEAR=2024 URL=https://www.frontiersin.org/journals/computer-science/articles/10.3389/fcomp.2024.1294985 DOI=10.3389/fcomp.2024.1294985 ISSN=2624-9898 ABSTRACT=With the rapid accumulation of large amounts of user-generated content through social media, social data reuse and integration has gained an increasing attention recently. This has made it almost obsolete for software applications to collect, store, and work with own data stored on local severs. Whilst, with the provision of Application Programming Interfaces from the leading social networking sites, the data acquisition and integration has become possible; the meaningful usage of such unstructured, non-uniform, and incoherent data collections need special procedures of data summarization, understanding, and visualization.One particular aspect in this regard that needs special attention are the procedures for data (text snippets in form of social media posts) categorization and concept tagging in order to filter out the relevant and most suitable data for the particular audience and for the particular purpose. In this regard, we propose a corpus-based approach for searching and successively categorizing and tagging the social data with relevant concepts in real-time. The proposed approach is capable of addressing the semantical and morphological similarities, and domainspecific vocabularies of query strings and tagged concepts. We demonstrate the feasibility and application of our proposed approach in a web-based tool that allows searching Facebook posts and provides search results together with a concept map for further navigation, filtering and refining of search results. The tool has been evaluated by performing multiple search queries and resultant concept maps and annotated texts are analyzed in terms of their precision. The approach is thereby found effective in achieving its stated goal of classifying text snippets in real-time.