Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Natural Language Processing

Volume 8 - 2025 | doi: 10.3389/frai.2025.1661789

This article is part of the Research TopicEmerging Techniques in Arabic Natural Language ProcessingView all 8 articles

Cross-Dialectal Arabic Translation: Comparative Analysis on Large Language Models

Provisionally accepted
Ayah  BeidasAyah BeidasFatme  GhaddarFatme GhaddarKousar  MohiKousar MohiImtiaz  AhmadImtiaz AhmadSa'Ed  AbedSa'Ed Abed*
  • Kuwait University, Kuwait City, Kuwait

The final, formatted version of the article will be published soon.

Exploring Arabic dialects in Natural Language Processing (NLP) is essential to understand linguistic variation and meet regional communication demands. Recent advances in Large Language Models (LLMs) have opened up new vistas for multilingual communication and text generation. This paper investigates the performance of GPT 3.5, GPT 4, and Bard (Gemini) on the QADI and MADAR datasets, while GPT 5 was evaluated exclusively on MADAR encompassing over 15 different countries. Several metrics have been used in the evaluation, such as cosine similarity, universal similarity encoder, sentence BERT, TER, ROUGE, and BLEU. Analysis revealed that in the QADI dataset, GPT 4 significantly outperformed others in translating MSA to DA, with ANOVA tests showing strong significance (p < 0.05) in most metrics, except for BLEU and TER where it does not show significance, indicating comparable translation performance among models. Furthermore, GPT 4 was highest in semantic similarity compared to GPT 3.5 and Bard (Gemini), 0.66, 0.61, and 0.63, respectively. GPT 4 was the best in identifying overlapping sentences (i.e., those where the source and target are identical) with a combined average of 0.41 in BLEU and ROUGE-L. All LLMs scored TER values between 6% and 25%, indicating generally good translation quality. However, GPT models, especially GPT 5, responded better to prompting and translation to Levant countries compared to Bard (Gemini). For the MADAR dataset, no significant translation differences were observed in sentence-BERT, ROUGE-L, and TER, while differences are identified in cosine similarity, BLEU, and universal similarity encoder metrics. Therefore, GPT 5 is the top performer in identifying sentence overlaps measured by BLEU and ROUGE-L (combined average 0.37). In this study, different prompting techniques were used: zero-shot and few-shot. Zero-shot was employed for all dialects, and few-shot was employed only for the least translation performance dialect, Tunisian. The few-shot approach did not show a significant improvement in translation performance, especially for GPT 4 and Bard (Gemini), while GPT 3.5 performed consistently. Zero-shot prompts were effective across dialects, while few-shot prompting, applied to the weakest-performing dialect (Tunisian), did not yield improvement, GPT 4 and Bard performed worse under this set-up, while GPT 3.5 remained consistent.

Keywords: Language models, GPT 3.5, GPT 4, GPT 5, Bard (Gemini), Arabic language, Dialects

Received: 08 Jul 2025; Accepted: 29 Aug 2025.

Copyright: © 2025 Beidas, Ghaddar, Mohi, Ahmad and Abed. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Sa'Ed Abed, Kuwait University, Kuwait City, Kuwait

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.