ORIGINAL RESEARCH article
Front. Oncol.
Sec. Thoracic Oncology
Volume 15 - 2025 | doi: 10.3389/fonc.2025.1682633
This article is part of the Research TopicAdvancing Diagnostic Excellence in Early Lung Cancer DetectionView all 7 articles
Development and Validation of Machine Learning Models for Predicting STAS in Stage I Lung Adenocarcinoma with Part-Solid and Solid Nodules: A Two-Center Study
Provisionally accepted- 1Dalian Medical University, Dalian, China
- 2Wuxi People's Hospital, Wuxi, China
- 3Xuzhou Medical University, Xuzhou, China
- 4Northern Jiangsu People's Hospital, Yangzhou, China
- 5The Affiliated Suqian Hospital of Xuzhou Medical University, Suqian, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Background: This study aimed to preoperatively predict spread through air spaces (STAS) in stage I lung adenocarcinoma presenting as part-solid and solid nodules by leveraging clinical features and machine learning models, thereby guiding surgical decision-making and enhancing patient counseling. Methods: A total of 473 patients were retrospectively enrolled, including 353 from our center and 120 from an validation cohort. Predictive features were selected using maximum relevance minimum redundancy (mRMR) and least absolute shrinkage and selection operator (LASSO) algorithms. Seven machine learning models—logistic regression, random forest, support vector machine (SVM), extreme gradient boosting (XGBoost), adaptive boosting (AdaBoost), light gradient boosting machine (LightGBM), and category boosting (CatBoost)—were developed and evaluated using receiver operating characteristic curves, calibration plots, and decision curve analysis (DCA). Feature importance was assessed using Shapley Additive Explanations (SHAP). A web-based nomogram was constructed for clinical application. Result: STAS was present in 44.76% of the training set and 50.83% of the validation cohort. Seven predictors were selected to construct the predictive models. The XGBoost model demonstrated superior performance with an AUC of 0.889 (95% CI, 0.852–0.926) in training and 0.856 (95% CI, 0.789–0.928) in validation. The calibration curves in training and validation set exhibited good agreement between the predictions and actual observations. The Decision Curve Analyses (DCA) provide significant clinical utility. SHAP analysis identified the most important predictors for STAS as CEA, vascular convergence, proGRP, age, AFP, smoking history, and CTR. Conclusion: The XGBoost model provides robust preoperative prediction of STAS and may assist clinicians in optimizing surgical strategies for patients with stage I lung adenocarcinoma.
Keywords: spread through air spaces, Lung Adenocarcinoma, machine learning, Solid andpart solid component, Surgical strategy
Received: 09 Aug 2025; Accepted: 17 Oct 2025.
Copyright: © 2025 Ren, liu, kai, Xu, Wang, Wu, You, Hu, Wang and Yusheng. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Xiaolin Wang, 18051063909@yzu.edu.cn
Shu Yusheng, 18051061999@yzu.edu.cn
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.