AUTHOR=Feng Xiaowei , Hong Tao , Liu Wencai , Xu Chan , Li Wanying , Yang Bing , Song Yang , Li Ting , Li Wenle , Zhou Hui , Yin Chengliang TITLE=Development and validation of a machine learning model to predict the risk of lymph node metastasis in renal carcinoma JOURNAL=Frontiers in Endocrinology VOLUME=Volume 13 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/endocrinology/articles/10.3389/fendo.2022.1054358 DOI=10.3389/fendo.2022.1054358 ISSN=1664-2392 ABSTRACT=Background: Lymph node metastasis (LNM) is associated with the prognosis of patients with kidney cancer. This study aimed to provide reliable machine learning-based (ML-based) models to predict the probability of LNM in kidney cancer. Methods: Patients diagnosed with kidney cancer were extracted from the surveillance, epidemiology, and results (SEER) database between 2010 and 2017 and were used as the training cohort, and patients from a medical center comprised the validation group. Filter variables through the least absolute contraction and selection operator (LASSO), univariate and multivariate logistic regression analysis. Use statistically significant risk factors to develop predictive models. We used 10-fold cross-validation in internal validation, and the results were externally validated. The area under the receiver operating characteristic curve (AUC) is used to evaluate model performance. Correlation heat maps are used to study the correlation of features, using permutation analysis to assess the importance of predictors. The probability density function (PDF) and clinical utility curve (CUC) are used to determine the critical value of clinical utility. Results: The training cohort of this study included 39,016 patients, and the validation cohort included 771 patients. In the two cohorts, 2544 (6.5%) and 66 (8.1%) patients had LNM, respectively. Pathological grade, liver metastasis, M stage, primary site, T stage, and tumor size were independent predictive factors of LNM. The XGB model was significantly superior to any one of the machine learning models in both the training and validation group, the AUC values were 0.916 and 0.915, respectively. Based on the PDF and CUC, we suggested 54.6% as a threshold probability for guiding the diagnosis of LNM, which could distinguish about 89% of LNM patients. Conclusions: The predictive tool based on machine learning can precisely indicate the probability of LNM in kidney cancer patients and has a satisfying application prospect in clinical practice.