AUTHOR=Palm Viktoria , Leutz-Schmidt Patricia , Mathy René Michael , Schwaiger Benedikt Jakob , Kauczor Hans-Ulrich , Jang Hyungseok , Sedaghat Sam 

TITLE=Utilization of large language models in decision-making for sustainability in radiology

JOURNAL=Frontiers in Medicine

VOLUME=Volume 12 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2025.1632925

DOI=10.3389/fmed.2025.1632925

ISSN=2296-858X

ABSTRACT=IntroductionRadiology has a significant environmental impact, but guidance on how to effectively implement sustainable practices in this field is limited. This study investigated the performance of large language models (LLMs) in providing sustainability advice for radiology.MethodsFour state-of-the-art LLMs, namely ChatGPT-4.0 (CGT), Claude 3.5 Sonnet (CS), Gemini Advanced (GA), and Meta Llama 3.1 405b (ML), were evaluated based on their answers to 30 standardized questions covering sustainability topics such as energy consumption, waste management, digitalization, best practices, and carbon footprint. Three experienced readers rated their response for quality (OQS), understandability (US), and implementability (IS) using a 4-point scale. A mean quality score (MQS) was derived from these three attributes.ResultsThe overall intraclass correlation was good (ICC = 0.702). Across the 30 questions on sustainability in radiology, all four LLMs showed good to very good performances, with the highest ratings being achieved in understandability (CGT/GA/ML 3.91 ± 0.29; CS 3.99 ± 0.11), underlining the excellent language skills of these models. CS emerged as the top performer across most topics, with an MQS of 3.95 ± 0.22, frequently achieving the highest scores. ML showed the second highest performance with an MQS of 3.84 ± 0.37, followed by CGT with an MQS of 3.78 ± 0.42 and GA with an MQS of 3.73 ± 0.44. Accordingly, CGT and GA showed comparable results, while GA consistently received lower mean scores than the other LLMs. None of the LLMs provided answers that were rated insufficient.ConclusionOur findings highlight the potential of LLMs such as Claude 3.5 Sonnet, ChatGPT-4.0, Meta Llama 3.1, and Gemini Advanced to advance sustainable practices in radiology, with thoughtful model selection further enhancing their positive impact due to model variations.