AUTHOR=Meissner Roy , Pögelt Alexander , Ihsberner Katja , Grüttmüller Martin , Tornack Silvana , Thor Andreas , Pengel Norbert , Wollersheim Heinz-Werner , Hardt Wolfram TITLE=LLM-generated competence-based e-assessment items for higher education mathematics: methodology and evaluation JOURNAL=Frontiers in Education VOLUME=Volume 9 - 2024 YEAR=2024 URL=https://www.frontiersin.org/journals/education/articles/10.3389/feduc.2024.1427502 DOI=10.3389/feduc.2024.1427502 ISSN=2504-284X ABSTRACT=In this article, we explore the transformative impact of advanced, parameter-rich Large Language Models (LLMs) on the production of instructional materials in higher education, with a focus on the automated generation of both formative and summative assessments for learners in the field of mathematics.We introduce a novel LLM-driven process and application, called ItemForge, tailored specifically for the automatic generation of e-assessment items in mathematics. The approach is thoroughly aligned with the levels and hierarchy of cognitive learning objectives as developed by Anderson & Krathwohl, and takes specific mathematical concepts from the considered courses into consideration. The quality of the generated free-text items, along with their corresponding answers (sample solutions), as well as their appropriateness to the designated cognitive level and subject matter, were evaluated in a small-scale study. In this study, three mathematical experts reviewed a total of 240 generated items, providing a comprehensive analysis of their effectiveness and relevance.Our findings demonstrate that the tool is proficient in producing high-quality items that align with the chosen concepts and targeted cognitive levels, indicating its potential suitability for educational purposes. However, it was observed that the provided answers (sample solutions) 1 Meissner et al.occasionally exhibited inaccuracies or were not entirely complete, signalling a necessity for additional refinement of the tool's processes.