ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Medicine and Public Health

Volume 8 - 2025 | doi: 10.3389/frai.2025.1582096

The Utility of Generative Artificial Intelligence Chatbot (ChatGPT) in Generating Teaching and Learning Material for Anesthesiology Residents

Provisionally accepted
Zhaosheng  JinZhaosheng Jin*Ramon  AbolaRamon AbolaVincent  BargnesVincent BargnesAlexandra  TsivitisAlexandra TsivitisSadiq  RamanSadiq RamanJonathon  SchwartzJonathon SchwartzSergio  Daniel BergeseSergio Daniel BergeseJoy  SchabelJoy Schabel
  • Department of Anesthesiology, Stony Brook Medicine, Stony Brook, United States

The final, formatted version of the article will be published soon.

The popularization of large language chatbots such as ChatGPT has led to growing utility in various biomedical fields. It has been shown that chatbots can provide reasonably accurate responses to medical exam style questions. On the other hand, chatbots have known limitations which may hinder their utility in medical education. We conducted a pragmatically designed study to evaluate the accuracy and completeness of ChatGPT generated responses to various styles of prompts, based on entry-level anesthesiology topics. Ninety-five unique prompts were constructed using topics from the Anesthesia Knowledge Test 1 (AKT-1), a standardized exam undertaken by US anesthesiology residents after one month of specialty training. A combination of focused and open-ended prompts was used to evaluate its ability to present and organize information. We also included prompts for journal references, lecture outlines, as well as biased (medically inaccurate) prompts. The responses were independently scored using a 3-point Likert scale, by two board-certified anesthesiologists with extensive experience in medical education. Fifty-two (55%) responses were rated as completely accurate by both evaluators. For longer responses prompts, most of the responses were also deemed complete. Notably, the chatbot frequently generated inaccurate responses when asked for specific literature references and when the input prompt contained deliberate errors (biased prompts). Another recurring observation was the conflation of adjacent concepts (e.g. a specific characteristic was attributed to the wrong drug under the same pharmacological class). Some of the inaccuracies could potentially result in significant harm if applied to clinical situations. While chatbots such as ChatGPT can generate medically accurate responses in most cases, its reliability is not yet suited for medical and clinical education. Content generated by ChatGPT and other chatbots will require validation prior to use.

Keywords: artificial intelligence, Graduate medical education, Large Language Model, Anesthesiology residency, generative A.I.

Received: 23 Feb 2025; Accepted: 28 Apr 2025.

Copyright: © 2025 Jin, Abola, Bargnes, Tsivitis, Raman, Schwartz, Bergese and Schabel. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Zhaosheng Jin, Department of Anesthesiology, Stony Brook Medicine, Stony Brook, United States

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.