Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Big Data

Sec. Data Analytics for Social Impact

This article is part of the Research TopicArtificial Intelligence Risks, Opportunities and Threats for Small and Medium EnterprisesView all 3 articles

Towards Robust Social Media Sentiment for SMEs: A Comparative Study of Dictionary-Based and Machine Learning Approaches with Insights for Hybrid Methodologies

Provisionally accepted
Heru  SusantoHeru Susanto1,2,3*Aida  Sari OmarAida Sari Omar2Alifya  Kayla Shafa SusantoAlifya Kayla Shafa Susanto2Desi  SetianaDesi Setiana2Leu  Fang-YieLeu Fang-Yie3Junaid  M. ShaikhJunaid M. Shaikh2Asep  InsaniAsep Insani4Uus  KhusniUus Khusni1Rachmat  HidayatRachmat Hidayat1Indra  AkbariIndra Akbari1Iwan  BasukiIwan Basuki1
  • 1Indonesia Institute of Sciences (LIPI), Jakarta, Indonesia
  • 2University of Technology Brunei, Gadong, Brunei-Muara, Brunei
  • 3College of Science, Tunghai University, Taichung, Taiwan
  • 4National Research and Innovation Agency (BRIN), Bogor, Jakarta, Indonesia

The final, formatted version of the article will be published soon.

Small and Medium-sized Enterprises (SMEs) increasingly rely on social media to engage customers, promote products, and enhance workplace collaboration. Customer opinions expressed through comments and posts on platforms such as Facebook and Instagram represent valuable insights, yet their informal and context-specific nature—often characterized by slang, misspellings, and bilingual usage—poses challenges for automated sentiment analysis. This study addresses this gap by comparatively evaluating dictionary-based and machine learning approaches to sentiment classification for SMEs' social media content. Data were collected from a diverse set of SMEs across multiple industries, with a substantial volume of customer comments extracted and pre-processed through tokenization, normalization, stop-word removal, and stemming. A customized dictionary was developed to account for local language variations, while Naïve Bayes and Support Vector Machine (SVM) models were employed as supervised classifiers. The findings indicate that dictionary-based methods, while simple and interpretable, struggle with accuracy when processing informal and localized language, whereas machine learning approaches deliver higher overall performance but require extensive preprocessing and tuning. Moreover, the study highlights the potential of hybrid frameworks that combine the interpretability of dictionary-based models with the adaptability of machine learning classifiers. This research contributes both practically and theoretically by (i) demonstrating the limitations of applying generic sentiment analysis tools in localized SME contexts, (ii) proposing a hybrid sentiment analysis framework tailored to SMEs, and (iii) offering empirical evidence to support digital transformation strategies for SMEs in resource-constrained environments. Ultimately, accurate sentiment analysis can enable SMEs to refine business strategies, strengthen customer engagement, and achieve sustainable growth in the digital economy.

Keywords: sentiment analysis, Social Media, SMEs, Dictionary-based methods, machine learning, Hybrid framework

Received: 16 Mar 2025; Accepted: 28 Nov 2025.

Copyright: © 2025 Susanto, Omar, Shafa Susanto, Setiana, Fang-Yie, Shaikh, Insani, Khusni, Hidayat, Akbari and Basuki. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Heru Susanto

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.