AUTHOR=Genc Ozgur , Tabakci Omer Naci TITLE=Comparison of artificial intelligence models and physicians in patient education for varicocele embolization: a double-blind randomized controlled trial JOURNAL=Frontiers in Radiology VOLUME=Volume 5 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/radiology/articles/10.3389/fradi.2025.1682725 DOI=10.3389/fradi.2025.1682725 ISSN=2673-8740 ABSTRACT=BackgroundLarge language models (LLMs) appear to be capable of performing a variety of tasks, including answering questions, but there are few studies evaluating them in direct comparison with clinicians. This study aims to compare the performance of artificial intelligence (AI) models and clinical specialists in informing patients about varicocele embolization. Additionally, we aim to establish an evidence base for future hybrid informational systems that integrate both AI and clinical expertise.MethodsIn this prospective, double-blind, randomized controlled trial, 25 frequently asked questions about varicocele embolization (collected via Google Search trends, patient forums, and clinical experience) were answered by three AI models (ChatGPT-4o, Gemini Pro, and Microsoft Copilot) and one interventional radiologist. Responses were randomized and evaluated by two independent interventional radiologists using a valid 5-point Likert scale for academic accuracy and empathy.ResultsGemini achieved the highest mean scores for both academic accuracy (4.09 ± 0.50, 95% CI: 3.95–4.23) and higher expert-rated scores for empathetic communication (3.54 ± 0.59, 95% CI: 3.38–3.70), followed by Copilot (academic: 4.07 ± 0.46, 95% CI: 3.94–4.20; empathy: 3.48 ± 0.53, 95% CI: 3.33–3.63), ChatGPT (academic: 3.83 ± 0.58, 95% CI: 3.67–3.99; empathy: 2.92 ± 0.78, 95% CI: 2.70–3.14), and the comparator physician (academic: 3.75 ± 0.41, 95% CI: 3.64–3.86; empathy: 3.12 ± 0.82, 95% CI: 2.89–3.35). ANOVA revealed statistically significant differences across groups for both academic accuracy (F = 6.181, p < 0.001, η2 = 0.086) and empathy (F = 9.106, p < 0.001, η2 = 0.122). Effect sizes were medium for academic accuracy and large for empathy.ConclusionsAI models, particularly Gemini, received higher ratings from expert evaluators compared to the comparator physician in patient education regarding varicocele embolization, excelling in both academic accuracy and empathetic communication style. These preliminary findings suggest that AI models hold significant potential to complement patient education systems in interventional radiology practice and provide compelling evidence for the development of hybrid patient education models.