Correction: Optimizing architectural-feature tradeoffs in Arabic automatic short answer grading: comparative analysis of fine-tuned AraBERTv2 models

, Frontiers Production Office

doi:10.3389/fcomp.2025.1734114

CORRECTION article

Front. Comput. Sci., 12 November 2025

Sec. Digital Education

Volume 7 - 2025 | https://doi.org/10.3389/fcomp.2025.1734114

Correction: Optimizing architectural-feature tradeoffs in Arabic automatic short answer grading: comparative analysis of fine-tuned AraBERTv2 models

This article is a correction to:

Optimizing architectural-feature tradeoffs in Arabic automatic short answer grading: comparative analysis of fine-tuned AraBERTv2 models
1. Read original article

Frontiers Production Office^*

Frontiers Media SA, Lausanne, Switzerland

A Correction on
Optimizing architectural-feature tradeoffs in Arabic automatic short answer grading: comparative analysis of fine-tuned AraBERTv2 models

by Mahmood, S. A. (2025). Front. Comput. Sci. 7:1683272. doi: 10.3389/fcomp.2025.1683272

There was a mistake in the article as published. Tables 1–7 and Figures 1–8 were published as supplementary material when they should have been added to the main article. The corrected figures and tables appear below.

Table 1

Table 1. Distribution of answers by question type.

Table 2

Table 2. Detailed distribution of randomly sampled responses across selected questions.

Table 3

Table 3. Performance evaluation of AraBERTv2 with MLP model using different feature sets: training vs. testing results.

Table 4

Table 4. Performance evaluation of AraBERTv2 with CNN model using different feature sets: training vs. testing results.

Table 5

Table 5. Performance evaluation of AraBERTv2 with LSTM model using different feature sets: training vs. testing results.

Table 6

Table 6. Performance comparison of AraBERTv2 fine-tuned models with MLP, CNN, and LSTM architectures using different feature sets.

Table 7

Table 7. Comparative performance evaluation of Arabic Automated Short Answer Grading (ASAG) systems.

Figure 1

Flowchart illustrating the process of training and testing with the AR-ASAG dataset. The sequence includes dataset loading, preprocessing, and splitting into 80% training and 20% testing. The training subset undergoes feature selection and AraBERT training, leading to finetuned AraBERT models. These models are evaluated and compared, followed by visualization to determine the best AraBERT model. Arrows indicate the workflow and connections among the steps.

Figure 1. General workflow of the proposed automated Arabic short-answer grading model using AraBERTv2.

Figure 2

Bar charts compare AraBERTv2 with LSTM across training and testing phases. The top charts show MAE and RMSE, and Pearson and Spearman correlations for different features in training. The bottom charts depict the same metrics for testing, highlighting variations in error values and correlation coefficients across two, three, and four features.

Figure 2. The AraBERT_MLP training methodology.

Figure 3

Bar charts showing AraBERTv2 with CNN performance during training and testing. Training error bars compare MAE and RMSE for 2, 3, and 4 features, with RMSE generally higher. Correlation values for Pearson and Spearman increase with more features. Testing error values increase with more features, while correlation values decrease slightly as features increase.

Figure 3. The AraBERT_CNN training methodology.

Figure 4

Flowchart depicting the AraBERT with MLP training stage. Feature selection leads to three models: 2-features (red), 3-features (green), and 4-features (purple). Each model connects to AraBERT with CNN, then fine-tuning AraBERT stages, ending with evaluation and comparison.

Figure 4. The AraBERT_LSTM training methodology.

Figure 5

Bar charts comparing AraBERTv2 with MLP performance in training and testing phases. In training, 4-feature shows minimal MAE and RMSE, with high Pearson and Spearman correlations. In testing, 2-feature has lower error values than 3-feature and 4-feature, though 4-feature performs slightly better in correlation values.

Figure 5. Performance evaluation of AraBERTv2 with MLP model using different feature sets: training vs. testing results.

Figure 6

Diagram illustrating a machine learning process, titled “AraBERT with MLP Training Stage.” It starts with “Feature Selection,” leads to three models: “2-features model,” “3-features model,” and “4-features model.” Each model goes to “AraBERT with CNN,” followed by “Fine tuning AraBERT,” and ends with “Evaluation and comparison.” Arrows indicate the flow direction.

Figure 6. Performance evaluation of AraBERTv2 with CNN model using different feature sets: training vs. testing results.

Figure 7

Scatter plot titled “Model Performance: Error vs Spearman Correlation” showing different models' performance using colored markers: AraBERTv2 with MLP, CNN, and LSTM. The x-axis represents MAE (mean absolute error), where lower is better, and the y-axis represents Spearman’s rank correlation, where higher is better. The plot uses different shapes to indicate feature numbers. Most points cluster between 1.5 to 2.0 MAE and 0.65 to 0.80 correlation, with one outlier beyond 3.5 MAE and below 0.45 correlation.

Figure 7. Performance evaluation of AraBERTv2 with LSTM model using different feature sets: training vs. testing results.

Figure 8

Diagram of a machine learning workflow featuring AraBERT with MLP. It starts with feature selection, separating into three models: a 2-feature model in red, a 3-feature model in green, and a 4-feature model in blue. These feed into the AraBERT with MLP stage, which then advances to fine-tuning AraBERT in individual boxes. An evaluation and comparison stage follows, indicated by arrows.

Figure 8. Fine-tuned models performance: MAE vs. spearman correlation.

All in-text Supplementary Table and Supplementary Figure in-text citations have been changed to Table and Figure in-text citations.

The original version of this article has been updated.

Generative AI statement

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Keywords: large language model (LLMs), AraBERT, neural network, Arabic natural language processing, educational assessment, Automated Short Answer Grading (ASAG)

Citation: Frontiers Production Office (2025) Correction: Optimizing architectural-feature tradeoffs in Arabic automatic short answer grading: comparative analysis of fine-tuned AraBERTv2 models. Front. Comput. Sci. 7:1734114. doi: 10.3389/fcomp.2025.1734114

Received: 28 October 2025; Accepted: 28 October 2025;
Published: 12 November 2025.

Approved by:

Frontiers Editorial Office, Frontiers Media SA, Switzerland

Copyright © 2025 Frontiers Production Office. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Frontiers Production Office, cHJvZHVjdGlvbi5vZmZpY2VAZnJvbnRpZXJzaW4ub3Jn

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.