ORIGINAL RESEARCH article
Front. Oncol.
Sec. Thoracic Oncology
Volume 15 - 2025 | doi: 10.3389/fonc.2025.1588147
This article is part of the Research TopicAdvancing Diagnostic Excellence in Early Lung Cancer DetectionView all articles
Predictive Model of Malignancy Probability in Pulmonary Nodules based on Multicenter Data
Provisionally accepted- Department of Respiratory and Critical Care Medicine, Affiliated Hospital of North Sichuan Medical College, Nanchong, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Objectives: To study the characteristic factors associated with the occurrence of malignant nodules in patients presenting with pulmonary nodules, develop a predictive model, and evaluate its diagnostic performance.This study analyzed the clinical and imaging data of 830 patients with pulmonary nodules from the Affiliated Hospital of North Sichuan Medical College.The Least Absolute Shrinkage and Selection Operator (LASSO) and multivariate logistic regression analysis were utilized to identify characteristic predictors. Multiple machine learning classification models were employed for analysis, with the optimal model ultimately selected. A Shapley Additive Explanations (SHAP) framework was developed for personalized risk assessment. Finally, external testing was performed using data from 330 pulmonary nodule patients at Guang'an People's Hospital.The predictive factors for malignant pulmonary nodules included: age, gender, nodule diameter, spiculation, lobulation, calcification, vacuole, vascular convergence sign, air bronchogram sign, pleural traction, and density of the nodule.The Gradient Boosting Decision Tree (GBDT) classification model demonstrated optimal performance, with an area under the curve (AUC) of 0.873 (95% confidence interval [CI]: 0.840-0.906) on the internal test set and 0.726 (95% CI: 0.668-0.784) on the external test set. Both the calibration curve and clinical decision curve analysis (DCA) indicated excellent model calibration and substantial clinical benefits.We developed a GBDT model that provides a basis for differentiating malignant pulmonary nodules, which may assist in the diagnosis and treatment of patients with pulmonary nodules.
Keywords: pulmonary nodules, malignancy, machine learning, Prediction model, External test
Received: 05 Mar 2025; Accepted: 12 May 2025.
Copyright: © 2025 Huang, Chen, He and Jiang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Li Jiang, Department of Respiratory and Critical Care Medicine, Affiliated Hospital of North Sichuan Medical College, Nanchong, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.