AUTHOR=Zhan Jianfeng , Zhang Jian , Zhu Shaoqi , Ni Lin , Zhang Chen , Hu Jia TITLE=Diagnostic performance of ultrasound characteristics-based artificial intelligence models for thyroid nodules: a systematic review and meta-analysis JOURNAL=Frontiers in Oncology VOLUME=Volume 15 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2025.1614603 DOI=10.3389/fonc.2025.1614603 ISSN=2234-943X ABSTRACT=BackgroundNowadays, artificial intelligence (AI) diagnostic models based on ultrasound features have been gradually integrated into the evaluation of thyroid nodules. However, the diagnostic effects of different AI-assisted diagnosis methods vary greatly.ObjectiveThis study aims to systematically evaluate the performance of the ultrasound-based artificial intelligence diagnostic models in differentiating benign and malignant thyroid nodules and to determine the most effective diagnostic model.MethodsWe conducted a comprehensive literature search in PubMed, Web of Science, and the Cochrane Library using subject-specific keywords to identify studies on AI-assisted thyroid nodule diagnosis. Study quality was assessed using Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2). Meta-analysis was performed using Meta-Disc 1.4, Review Manager 5.4, R 4.4.2, and Stata 17.0. Pooled sensitivity, specificity, diagnostic odds ratio (DOR), and area under the summary receiver operating characteristic curve (SROC-AUC) with 95% confidence intervals (CI) were calculated. Subgroup analyses and clinical applicability assessments were conducted.ResultsTwenty-eight studies involving 134,028 patients, 158,161 thyroid nodules, and 529,479 ultrasound images were included. The AI-assisted diagnostic system demonstrated high diagnostic performance: pooled sensitivity = 0.89 (95% CI: 0.87–0.91), specificity = 0.84 (0.80–0.88), positive likelihood ratio (PLR) = 5.60 (4.40–7.20), negative likelihood ratio (NLR) = 0.13 (0.10–0.16), DOR = 43.94 (30.11–64.14), and SROC-AUC = 0.93 (0.91–0.95). The threshold effect analysis (Spearman correlation = -0.18, P > 0.05) indicated no significant heterogeneity. The diagnostic accuracy is higher in Asian countries, in prospective and multicenter designs, with external validation sets, without cross-validation, with deep learning, and in postoperative patient subgroups. Additionally, improved performance was observed in cohorts with smaller nodule diameters (<20 mm), higher malignancy rates, older patient age (≥50 years), and higher female proportions, though heterogeneity remained significant. Univariate and multivariate meta-regression analyses identified AI type, malignancy rate of nodules as significant sources of heterogeneity. Notably, the EDLC-TN model showed the highest diagnostic accuracy.ConclusionAI-assisted diagnostic techniques demonstrate significant potentialin thyroid nodule evaluation, with the EDLC-TN model showing particularly high clinical utility. Optimal diagnostic performance was observed for nodules <20 mm in diameter and in patients aged ≥50 years.Systematic review registrationhttps://www.crd.york.ac.uk/PROSPERO/view/CRD42024581421, identifier CRD42024581421.