AUTHOR=Artsi Yaara , Sorin Vera , Glicksberg Benjamin S. , Korfiatis Panagiotis , Nadkarni Girish N. , Klang Eyal 

TITLE=Large language models in real-world clinical workflows: a systematic review of applications and implementation

JOURNAL=Frontiers in Digital Health

VOLUME=Volume 7 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/digital-health/articles/10.3389/fdgth.2025.1659134

DOI=10.3389/fdgth.2025.1659134

ISSN=2673-253X

ABSTRACT=BackgroundLarge language models (LLMs) offer promise for enhancing clinical care by automating documentation, supporting decision-making, and improving communication. However, their integration into real-world healthcare workflows remains limited and under characterized. This systematic review aims to evaluate the literature on real-world implementation of LLMs in clinical workflows, including their use cases, clinical settings, observed outcomes, and challenges.MethodsWe searched MEDLINE, Scopus, Web of Science, and Google Scholar for studies published between January 2015 and April 2025 that assessed LLMs in real-world clinical applications. Inclusion criteria were peer-reviewed, full-text studies in English reporting empirical implementation of LLMs in clinical settings. Study quality and risk of bias were assessed using the PROBAST tool.ResultsFour studies published between 2024 and 2025 met inclusion criteria. All used generative pre-trained transformers (GPTs). Reported applications included outpatient communication, mental health support, inbox message drafting, and clinical data extraction. LLM deployment was associated with improvements in operational efficiency, user satisfaction, and reduced workload. However, challenges included performance variability across data types, limitations in generalizability, regulatory delays, and lack of post-deployment monitoring.ConclusionsEarly evidence suggests that LLMs can enhance clinical workflows, but real-world adoption remains constrained by systemic, technical, and regulatory barriers. To support safe and scalable use, future efforts should prioritize standardized evaluation metrics, multi-site validation, human oversight, and implementation frameworks tailored to clinical settings.Systematic Review Registrationhttps://www.crd.york.ac.uk/PROSPERO/recorddashboard, PROSPERO CRD420251030069.