Your new experience awaits. Try the new design now and help us make it even better

SYSTEMATIC REVIEW article

Front. Digit. Health

Sec. Health Informatics

Volume 7 - 2025 | doi: 10.3389/fdgth.2025.1659134

This article is part of the Research TopicGenerative AI and Large Language Models in Medicine: Applications, Challenges, and OpportunitiesView all articles

Large Language Models in Real-World Clinical Workflows: A Systematic Review of Applications and Implementation

Provisionally accepted
Yaara  ArtsiYaara Artsi1,2*Vera  SorinVera Sorin3Benjamin  S GlicksbergBenjamin S Glicksberg4,5,6Panagiotis  KorfiatisPanagiotis Korfiatis3Girish  N NadkarniGirish N Nadkarni4,5,6Eyal  KlangEyal Klang4,5,6
  • 1Bar-Ilan University Press, Ramat Gan, Israel
  • 2Azrieli Faculty of Medicine, Bar-Ilan University, Zefat, Israel, Zefat, Israel
  • 3Department of Radiology, Mayo Clinic, Rochester, MN, USA, Minnesota, United States
  • 4The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA, New York, United States
  • 5The Windreich Department of Artificial Intelligence and Human Health, Mount Sinai Medical Center, NY, USA, New York, United States
  • 6The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA, New York, United States

The final, formatted version of the article will be published soon.

ABSTRACT Background Large language models (LLMs) offer promise for enhancing clinical care by automating documentation, supporting decision-making, and improving communication. However, their integration into real-world healthcare workflows remains limited and under characterized. This systematic review aims to evaluate the literature on real-world implementation of LLMs in clinical workflows, including their use cases, clinical settings, observed outcomes, and challenges. Methods We searched MEDLINE, Scopus, Web of Science, and Google Scholar for studies published between January 2015 and April 2025 that assessed LLMs in real-world clinical applications. Inclusion criteria were peer-reviewed, full-text studies in English reporting empirical implementation of LLMs in clinical settings. Study quality and risk of bias were assessed using the PROBAST tool. Results Four studies published between 2024 and 2025 met inclusion criteria. All used generative pre-trained transformers (GPTs). Reported applications included outpatient communication, mental health support, inbox message drafting, and clinical data extraction. LLM deployment was associated with improvements in operational efficiency, user satisfaction, and reduced workload. However, challenges included performance variability across data types, limitations in generalizability, regulatory delays, and lack of post-deployment monitoring. Conclusions Early evidence suggests that LLMs can enhance clinical workflows, but real-world adoption remains constrained by systemic, technical, and regulatory barriers. To support safe and scalable use, future efforts should prioritize standardized evaluation metrics, multi-site validation, human oversight, and implementation frameworks tailored to clinical settings.

Keywords: Large language models, Real-world application, Clinical implementation, artificial intelligence, Healthcare workflows

Received: 03 Jul 2025; Accepted: 11 Sep 2025.

Copyright: © 2025 Artsi, Sorin, Glicksberg, Korfiatis, Nadkarni and Klang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Yaara Artsi, Bar-Ilan University Press, Ramat Gan, Israel

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.