AUTHOR=Ableitinger Christoph , Dorner Christian 

TITLE=Challenges in using ChatGPT to code student's mistakes

JOURNAL=Frontiers in Education

VOLUME=Volume 10 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/education/articles/10.3389/feduc.2025.1632548

DOI=10.3389/feduc.2025.1632548

ISSN=2504-284X

ABSTRACT=The rapid advancements in artificial intelligence (AI) have sparked interest in its application within mathematics education, particularly in automating the coding and grading of student solutions. This study investigates the potential of ChatGPT, specifically the GPT-4 Turbo model, to assess student solutions to procedural mathematics tasks, focusing on its ability to identify correctness and categorize errors into two domains: “knowledge of the procedure” and “arithmetic/algebraic skills.” The research is motivated by the need to reduce the time-intensive nature of coding and grading and to explore AI's reliability in this context. The study employed a two-phase approach using a dataset of handwritten student solutions of a system of linear equations: first, ChatGPT was trained using student solutions that were rewritten by one of the authors to ensure consistency in handwriting style; its performance was then tested with additional solutions, also in the same handwriting. The findings reveal significant challenges, including frequent errors in handwriting recognition, misinterpretation of mathematical symbols, and inconsistencies in the categorization of mistakes. Despite iterative feedback and prompt adjustments, ChatGPT's performance remained inconsistent, with only partial success in accurately coding solutions. The study concludes that while ChatGPT shows promise as a coding aid, its current limitations—particularly in recognizing handwritten inputs and maintaining consistency—highlight the need for improvement. These findings contribute to the growing discourse on AI's role in education, emphasizing the importance of improving AI tools for practical classroom and research applications.