AUTHOR=Yin Yong , Zeng Mei , Wang Hansong , Yang Haibo , Zhou Caijing , Jiang Feng , Wu Shufan , Huang Tingyue , Yuan Shuahua , Lin Jilei , Tang Mingyu , Chen Jiande , Dong Bin , Yuan Jiajun , Xie Dan TITLE=A clinician-based comparative study of large language models in answering medical questions: the case of asthma JOURNAL=Frontiers in Pediatrics VOLUME=Volume 13 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/pediatrics/articles/10.3389/fped.2025.1461026 DOI=10.3389/fped.2025.1461026 ISSN=2296-2360 ABSTRACT=ObjectiveThis study aims to evaluate and compare the performance of four major large language models (GPT-3.5, GPT-4.0, YouChat, and Perplexity) in answering 32 common asthma-related questions.Materials and methodsSeventy-five clinicians from various tertiary hospitals participated in this study. Each clinician was tasked with evaluating the responses generated by the four large language models (LLMs) to 32 common clinical questions related to pediatric asthma. Based on predefined criteria, participants subjectively assessed the accuracy, correctness, completeness, and practicality of the LLMs' answers. The participants provided precise scores to determine the performance of each language model in answering pediatric asthma-related questions.ResultsGPT-4.0 performed the best across all dimensions, while YouChat performed the worst in all dimensions. Both GPT-3.5 and GPT-4.0 outperformed the other two models, but there was no significant difference in performance between GPT-3.5 and GPT-4.0 or between YouChat and Perplexity.ConclusionGPT and other large language models can answer medical questions with a certain degree of completeness and accuracy. However, clinical physicians should critically assess internet information, distinguishing between true and false data, and should not blindly accept the outputs of these models. With advancements in key technologies, LLMs may one day become a safe option for doctors seeking information.