AUTHOR=Zhou Weijun , Li Lijuan , Hao Xiaowen , Wu Lanying , Liu Lifu , Zheng Binyu , Xia Yangzheng , Liu Yong TITLE=Predicting central lymph node metastasis in papillary thyroid microcarcinoma: a breakthrough with interpretable machine learning JOURNAL=Frontiers in Endocrinology VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/endocrinology/articles/10.3389/fendo.2025.1537386 DOI=10.3389/fendo.2025.1537386 ISSN=1664-2392 ABSTRACT=ObjectiveTo develop and validate an interpretable machine learning (ML) model for the preoperative prediction of central lymph node metastasis (CLNM) in papillary thyroid microcarcinoma (PTMC).MethodsFrom December 2016 to December 2023, we retrospectively analyzed 710 PTMC patients who underwent thyroidectomies. Feature selection was conducted using the least absolute shrinkage and selection operator (LASSO) regression method, alongside the Support Vector Machine-Recursive Feature Elimination (SVM-RFE) algorithm in conjunction with multivariate logistic regression. Eight ML algorithms, namely Decision Tree, Random Forest (RF), K-nearest neighbors, Support vector machine, Extreme Gradient Boosting, Naive Bayes, Logistic regression, and Light Gradient Boosting machine, were developed for the prediction of CLNM. The performance of these models was evaluated using area under the receiver operating characteristic curve (AUC), decision curve analysis (DCA), sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), and F1 scores. Additionally, the Shapley Additive Explanation (SHAP) algorithm was utilized to clarify the results of the optimal ML model.ResultsThe results indicated that 32.95% of the patients (234/710) presented with CLNM. Tumor diameter, multifocality, lymph nodes identified via ultrasound (US-LN), and extrathyroidal extension (ETE) were identified as independent predictors of CLNM. The RF model achieved the highest performance in the validation set with an AUC of 0.893(95%CI: 0.846-0.940), accuracy of 0.832, sensitivity of 0.764, specificity of 0.866, PPV of 0.743, NPV of 0.879, and F1-score of 0.753. Furthermore, the DCA demonstrated that the RF model exhibited a superior clinical net benefit.ConclusionOur model predicted the risk of CLNM in PTMC patients with high accuracy preoperatively.