Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Natural Language Processing

Volume 8 - 2025 | doi: 10.3389/frai.2025.1653728

This article is part of the Research TopicEmerging Techniques in Arabic Natural Language ProcessingView all 7 articles

Leveraging Pre-trained Embeddings in an Ensemble Machine Learning Approach for Arabic Sentiment Analysis

Provisionally accepted
Areej  JaberAreej Jaber1*Israa  BahatiIsraa Bahati1Paloma  MartinezPaloma Martinez2
  • 1Palestine technical university - Kadoorie, Tulkarm, Palestine
  • 2Universidad Carlos III de Madrid, Getafe, Spain

The final, formatted version of the article will be published soon.

Arabic sentiment analysis presents unique challenges due to the linguistic complexity of Arabic, including its wide range of dialects, orthographic ambiguity, and limited language resources. This study explores the application of ensemble machine learning methods to improve sentiment classification performance in Arabic. Several homogeneous ensemble techniques are implemented and evaluated using two datasets: the balanced ArTwitter dataset and the highly imbalanced Syria_Tweets dataset. To address class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) is applied. The models incorporate pre-trained word embeddings and unigram features. Experimental results demonstrate that individual classifiers using pre-trained embed-dings perform well, but ensemble models yield superior performance. On the ArTwitter dataset, the ensemble of Naive Bayes, Support Vector Machine, and Decision Tree classifiers achieves an accuracy of 90.22% and 92.0% F1-score, while on the Syria_Tweets dataset, an ensemble combining Stochastic Gradient Descent, k-Nearest Neighbors, and Random Forest reaches 83.82% accuracy and 83.86% F1-score. These findings highlight the effectiveness of ensemble learning in enhancing the robustness and generalizability of Arabic sentiment analysis systems when supported by pre-trained embedding representations.

Keywords: ensemble learning, sentiment analysis, machine learning, Arabic language, SMOTE

Received: 25 Jun 2025; Accepted: 21 Aug 2025.

Copyright: © 2025 Jaber, Bahati and Martinez. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Areej Jaber, Palestine technical university - Kadoorie, Tulkarm, Palestine

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.