Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Educ.

Sec. Digital Education

A comparison of lessons planned by different publicly available Large Language Models in the context of physical education: an expert survey

Provisionally accepted
Benedikt  MeixnerBenedikt Meixner1,2*Clara  TristramClara Tristram3Maritta  SchrannerMaritta Schranner4Alessandra  KennerAlessandra Kenner3Esther  Serwe-PandrickEsther Serwe-Pandrick5Billy  SperlichBilly Sperlich2Peter  DükingPeter Düking5*
  • 1Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
  • 2Julius-Maximilians-Universitat Wurzburg, Würzburg, Germany
  • 3Friedrich-Alexander-Universitat Erlangen-Nurnberg, Erlangen, Germany
  • 4Janusz-Korczak Schule, Bayreuth, Germany
  • 5Technische Universitat Braunschweig, Brunswick, Germany

The final, formatted version of the article will be published soon.

Large language models (LLMs) have the potential to assist teachers, particularly in lesson planning. The quality of lessons generated by various LLMs remains unexplored. We investigated the quality of different LLMs for lesson planning, using the basketball layup as example and surveying experts in the field. A prompt was submitted to three LLMs (GPT-4o, Claude Sonnet, and Google Gemini). Twenty-eight quality criteria to evaluate lessons were predefined and employed. Teaching experts rated the plans on 5-point Likert scales and provided additional comments. A Friedman test was conducted to identify differences in quality among lesson plans. The most frequent median rating across all lesson plans was "acceptable" (3 on a 1-5 Likert scale), accounting for 64 out of 84 total ratings. For most criteria (26 out of 28), no group differences were observed between the lesson plans by Claude, Gemini, and GPT-4o. Free-text comments from raters highlighted that certain requirements such as allotted time or number of students were sometimes disregarded by specific LLMs despite being included in the prompt. LLMs are capable of generating basketball layup lessons of acceptable quality; however, these require review and refinement by experienced teachers. Herein investigated LLMs displayed no differences for most evaluated criteria. While LLMs can provide valuable starting points, teachers need to acknowledge their limitations and tailor the lessons accordingly. Key Points: Various publicly available LLMs are able to provide acceptable starting points for lesson planning in physical education, but need careful refinement by experienced teachers.

Keywords: AI, Basketball, higher education, Large Language Model, Lesson plan, Teaching

Received: 29 Dec 2025; Accepted: 16 Feb 2026.

Copyright: © 2026 Meixner, Tristram, Schranner, Kenner, Serwe-Pandrick, Sperlich and Düking. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Benedikt Meixner
Peter Düking

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.