ORIGINAL RESEARCH article
Front. Artif. Intell.
Sec. Natural Language Processing
This article is part of the Research TopicOnline Hate Speech: Linguistic Challenges in the Age of AIView all articles
Advancing Cyberbullying Detection in Low-resource Languages: A Transformer-stacking Framework for Bengali
Provisionally accepted- 1University of Chittagong, Chattogram, Bangladesh
- 2Gopalganj Science and Technology University, Gopalganj District, Bangladesh
- 3Kristiania University College, Oslo, Norway
- 4University of Cambridge, Cambridge, United Kingdom
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Cyberbullying on social networks has emerged as a pressing global issue, yet research in low-resource languages such as Bengali remains underdeveloped due to the scarcity of high-quality datasets, linguistic resources, and targeted methodologies. Many existing approaches overlook essential language-specific preprocessing, neglect the integration of advanced transformer-based models, and do not adequately address model validation, scalability, and adaptability. To address these limitations, this study introduces three Bengali-specific preprocessing strategies to enhance feature representation. It then proposes Transformer-stacking, an effective hybrid detection framework that combines three transformer models, XLM-R-base, multilingual BERT, and Bangla-Bert-Base, via a stacking strategy with a multi-layer perceptron classifier. The framework is evaluated on a publicly available Bengali cyberbullying dataset comprising 44,001 samples across both binary (Sub-task A) and multiclass (Sub-task B) classification settings. Transformer-stacking achieves an F1-score of 93.61% and an accuracy of 93.62% for Sub-task A, and an F1-score and accuracy of 89.23% for Sub-task B, outperforming eight baseline transformer models, four transformer ensemble techniques, and recent state-of-the-art methods. These improvements are statistically validated using McNemar’s test. Furthermore, experiments on two external Bengali datasets, focused on hate speech and abusive language, demonstrate the model's scalability and adaptability. Overall, Transformer-stacking offers an effective and generalizable solution for Bengali cyberbullying detection, establishing a new benchmark in this underexplored domain.
Keywords: additional preprocessing, Bengali, cyberbullying, Low-resource language, transformer integration
Received: 05 Aug 2025; Accepted: 17 Nov 2025.
Copyright: © 2025 Hoque, Deb Nath, Chy, Ghose and Seddiqui. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Debasish Ghose, debasish.ghose@kristiania.no
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
