ORIGINAL RESEARCH article
Front. Digit. Health
Sec. Connected Health
Comparative performance of ChatGPT-5 and DeepSeek on the Chinese Ultrasound Medicine Senior Professional Title Examination
The Second Affiliated Hospital of Fujian Medical University, Quanzhou, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Abstract
Background: Large language models (LLMs) have shown growing potential for medical education and assessment, but evidence on their performance in specialty certification exams in China—particularly in ultrasound medicine—remains limited. Objective: To compare the performance of ChatGPT-5 and DeepSeek on the Chinese Ultrasound Medicine Senior Professional Title Examination, overall and by item type. Methods: Between August and September 2025, we randomly selected 100 multiple-choice questions from the official Chinese Ultrasound Medicine Senior Professional Title Examination bank (60 image-based interpretation items and 40 text-based items). We evaluated ChatGPT-5 and DeepSeek using identical prompts through their public web interfaces. The primary outcome was overall accuracy; secondary outcomes were accuracy by item type and subspecialty. Between-model differences were assessed using two-proportion z-tests (α = 0.05) in Python 3.12. Results: Overall accuracy was higher for ChatGPT-5 than for DeepSeek (74.0% [74/100] vs. 60.0% [60/100]; p = 0.035). Accuracy on image-based items was also higher for ChatGPT-5 (61.7% vs. 40.0%; p = 0.018). Performance on text-based items was similar for both models (92.5% vs. 90.0%). Subspecialty patterns varied across domains; however, no between-model differences reached statistical significance. Conclusions: ChatGPT-5 outperformed DeepSeek on image-based items (61.7% vs. 40.0%), while both models performed similarly on text-based knowledge items (92.5% vs. 90.0%). Overall, both LLMs showed strong performance on Chinese ultrasound senior-title examination questions, with complementary strengths across content areas. They may be useful as supplementary educational tools, but further advances in multimodal reasoning are needed to support more reliable image interpretation.
Summary
Keywords
artificial intelligence, ChatGPT-5, deepseek, Medical Education, professional title examination, Ultrasound Medicine
Received
08 January 2026
Accepted
17 February 2026
Copyright
© 2026 Hong and Huang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Daorong Hong; Chunyan Huang
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.