Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Comput. Sci.

Sec. Digital Education

Volume 7 - 2025 | doi: 10.3389/fcomp.2025.1683272

Optimizing Architectural-Feature Tradeoffs in Arabic Automatic Short Answer Grading: Comparative Analysis of Fine-Tuned AraBERTv2 Models

Provisionally accepted
  • 1Basrah University College of Science & Technology, Basrah, Iraq
  • 2University of Basrah, Basrah, Iraq

The final, formatted version of the article will be published soon.

Automated essay evaluation systems represent a contemporary solution to the challenges presented by technological advancements in education, offering high accuracy in assessment while reducing reliance on human resources. This makes them essential in light of the growing demand for fast and reliable evaluation systems. However, a critical concern remains regarding the precision of these systems in their assessments and their ability to generalize in environments where large datasets are not readily available. This research aims to examine the generalizability of Automated Short Answer Grading (ASAG) systems under different training conditions, including unannotated data and annotated data. Through a comprehensive comparative methodology, the study evaluates the performance of precisely fine-tuned AraBERTv2 models integrated with three neural network architectures: Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), and Long Short-Term Memory (LSTM), while testing them with varying numbers of features (2, 3, 4) using the AS-ARSG dataset. The primary goal is to explore the models' generalizability when incomplete data is available (unannotated or partially annotated) and to develop a flexible framework that reduces dependence on human assessment while maintaining grading quality. The results confirm that the two-feature MLP model outperformed all others by achieving the best performance with less error and high correlation values (MAE=1.31, Spearman's coefficient=0.808). In contrast, performance degradation was noted with the increasing number of features, especially in LSTM models. Through this approach, the research contributes to developing Arabic ASAG systems capable of adapting to limited data scenarios, thereby enhancing their efficiency and practical applicability.

Keywords: Large Language Model (LLMs), AraBERT, Neural Network, Arabic NaturalLanguage Processing, educational assessment, Automated Short Answer Grading (ASAG)

Received: 10 Aug 2025; Accepted: 30 Sep 2025.

Copyright: © 2025 Mahmood. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Salma Abdulbaki Mahmood, salma.mahmood@uobasrah.edu.iq

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.