ORIGINAL RESEARCH article
Front. Comput. Sci.
Sec. Digital Education
Volume 7 - 2025 | doi: 10.3389/fcomp.2025.1683272
Optimizing Architectural-Feature Tradeoffs in Arabic Automatic Short Answer Grading: Comparative Analysis of Fine-Tuned AraBERTv2 Models
Provisionally accepted- 1Basrah University College of Science & Technology, Basrah, Iraq
- 2University of Basrah, Basrah, Iraq
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Automated essay evaluation systems represent a contemporary solution to the challenges presented by technological advancements in education, offering high accuracy in assessment while reducing reliance on human resources. This makes them essential in light of the growing demand for fast and reliable evaluation systems. However, a critical concern remains regarding the precision of these systems in their assessments and their ability to generalize in environments where large datasets are not readily available. This research aims to examine the generalizability of Automated Short Answer Grading (ASAG) systems under different training conditions, including unannotated data and annotated data. Through a comprehensive comparative methodology, the study evaluates the performance of precisely fine-tuned AraBERTv2 models integrated with three neural network architectures: Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), and Long Short-Term Memory (LSTM), while testing them with varying numbers of features (2, 3, 4) using the AS-ARSG dataset. The primary goal is to explore the models' generalizability when incomplete data is available (unannotated or partially annotated) and to develop a flexible framework that reduces dependence on human assessment while maintaining grading quality. The results confirm that the two-feature MLP model outperformed all others by achieving the best performance with less error and high correlation values (MAE=1.31, Spearman's coefficient=0.808). In contrast, performance degradation was noted with the increasing number of features, especially in LSTM models. Through this approach, the research contributes to developing Arabic ASAG systems capable of adapting to limited data scenarios, thereby enhancing their efficiency and practical applicability.
Keywords: Large Language Model (LLMs), AraBERT, Neural Network, Arabic NaturalLanguage Processing, educational assessment, Automated Short Answer Grading (ASAG)
Received: 10 Aug 2025; Accepted: 30 Sep 2025.
Copyright: © 2025 Mahmood. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Salma Abdulbaki Mahmood, salma.mahmood@uobasrah.edu.iq
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.