AUTHOR=Rimal Yagyanath , Sharma Navneet TITLE=Ensemble machine learning prediction accuracy: local vs. global precision and recall for multiclass grade performance of engineering students JOURNAL=Frontiers in Education VOLUME=Volume 10 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/education/articles/10.3389/feduc.2025.1571133 DOI=10.3389/feduc.2025.1571133 ISSN=2504-284X ABSTRACT=This study examines the prediction accuracy of ensemble machine learning models by comparing local and global precision, recall, and accuracy for multiclass grading of engineering students. It also investigates the performance of various machine learning models in predicting the multiclass grading outcomes for these students. The primary goal is to address challenges in multiclass data preparation and evaluate the best machine learning models using both micro and macro accuracy metrics derived from baseline comparisons. The results highlight a significant comparative analysis of prediction accuracy across different algorithms, emphasizing the importance of employing multiple receiver operating characteristic curves, areas under the curves, and a one-vs-rest classification approach when target features are represented as letter grades. The algorithms examined include decision trees, K-nearest neighbors, random forests, support vector machines, XGBoost, gradient boosting, and bagging. Gradient boosting achieves the highest global accuracy for macro predictions at 67%. It is followed by random forests at 64%, bagging at 65%, K-nearest neighbors at 60%, XGBoost at 60%, decision trees at 55%, and support vector machines at 59%. When considering micro prediction accuracy at the individual student level, support vector machines, random forests, and XGBoost closely align with true student grades, with accuracies of 19, 22, and 33%, respectively, at baseline. Notably, these models accurately predict the C grade with 97% precision, whereas predicting the A grade proves more challenging, with an accuracy of only 66%. These findings are further corroborated by precision-recall error plots. The grid search for random forest algorithms achieved a score of 79% when optimally tuned; however, the training accuracy was 99%. The results have implications for both students and educational institutions, helping identify areas for improvement and recognizing high achievers, which ultimately contributes to enhanced academic outcomes for engineering students.