ORIGINAL RESEARCH article
Front. Artif. Intell.
Sec. Medicine and Public Health
Volume 8 - 2025 | doi: 10.3389/frai.2025.1678320
This article is part of the Research TopicAdvancing Gastrointestinal Disease Diagnosis with Interpretable AI and Edge Computing for Enhanced Patient CareView all 3 articles
When AI Speaks Like a Specialist: ChatGPT-4 in the Management of Inflammatory Bowel Disease
Provisionally accepted- 1Policlinico Tor Vergata, Rome, Italy
- 2University of Miami Miller School of Medicine, Miami, United States
- 3Universita degli Studi di Roma Tor Vergata, Rome, Italy
- 4Azienda Ospedaliera San Camillo Forlanini, Rome, Italy
- 5Cambridge University Hospitals NHS Foundation Trust, Cambridge, United Kingdom
- 6Fondazione Policlinico Universitario Agostino Gemelli IRCCS CEMAD, Rome, Italy
- 7Hopital Edouard Herriot, Lyon, France
- 8Unidade Local de Saude do Algarve, Faro, Portugal
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Abstract Background: Artificial intelligence (AI) is gaining traction in healthcare, especially for patients' education. Inflammatory bowel diseases (IBD) require continuous engagement, yet the quality of online information accessed by patients is inconsistent. ChatGPT, a generative AI model, has shown promise in medical scenarios, but its role in IBD communication needs further evaluation. The objective of this study was to assess the quality of ChatGPT-4's responses to common patient questions about IBD, compared to those provided by experienced IBD specialists. Methods: Twenty-five frequently asked questions were collected during routine IBD outpatient visits and categorized into five themes: pregnancy/breastfeeding, diet, vaccinations, lifestyle, and medical therapy/surgery. Each question was answered by ChatGPT-4 and by two expert gastroenterologists. Responses were anonymized and evaluated by 12 physicians (6 IBD experts and 6 non-experts) using a 5-point Likert scale across four dimensions: accuracy, reliability, comprehensibility, and actionability. Evaluators also attempted to identify whether responses were AI-or human-generated. Results: ChatGPT-4 responses received significantly higher overall scores than those from human experts (mean 4.28 vs. 4.05; p < 0.001). The best-rated scenarios were medical therapy and surgery; the diet scenario consistently received lower scores. Only 33% of AI-generated responses were correctly identified as such, indicating strong similarity to human-written answers. Both expert and non-expert evaluators rated AI responses highly, though IBD specialists gave higher ratings overall. Conclusion: ChatGPT-4 generated high-quality, clear, and actionable responses to IBD-related patient questions, often outperforming human experts. Its outputs were frequently indistinguishable from those written by physicians, suggesting potential as a supportive tool for patient education. Nonetheless, further studies are needed to assess real-world application and ensure appropriate use in personalized clinical care.
Keywords: IBD, Artificial Inteligence-AI, ulcerative colitis, crohn, Inflammation
Received: 11 Aug 2025; Accepted: 24 Sep 2025.
Copyright: © 2025 De Cristofaro, Zorzi, Abreu, Colella, Del Vecchio Blanco, Fiorino, Lolli, Noor, Lopetuso, Pioche, Grimaldi, Paoluzi, Roseira, Sena, Troncone, Calabrese, Monteleone and Marafini. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Irene Marafini, irene.marafini@gmail.com
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.