Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Oncol.

Sec. Thoracic Oncology

Volume 15 - 2025 | doi: 10.3389/fonc.2025.1682633

This article is part of the Research TopicAdvancing Diagnostic Excellence in Early Lung Cancer DetectionView all 7 articles

Development and Validation of Machine Learning Models for Predicting STAS in Stage I Lung Adenocarcinoma with Part-Solid and Solid Nodules: A Two-Center Study

Provisionally accepted
Qinglin  RenQinglin Ren1lin  liulin liu2chu  kaichu kai3Xinrong  XuXinrong Xu4Huijun  WangHuijun Wang1Jun  WuJun Wu4Jinzhi  YouJinzhi You5Junxi  HuJunxi Hu4Xiaolin  WangXiaolin Wang4*Shu  YushengShu Yusheng4*
  • 1Dalian Medical University, Dalian, China
  • 2Wuxi People's Hospital, Wuxi, China
  • 3Xuzhou Medical University, Xuzhou, China
  • 4Northern Jiangsu People's Hospital, Yangzhou, China
  • 5The Affiliated Suqian Hospital of Xuzhou Medical University, Suqian, China

The final, formatted version of the article will be published soon.

Background: This study aimed to preoperatively predict spread through air spaces (STAS) in stage I lung adenocarcinoma presenting as part-solid and solid nodules by leveraging clinical features and machine learning models, thereby guiding surgical decision-making and enhancing patient counseling. Methods: A total of 473 patients were retrospectively enrolled, including 353 from our center and 120 from an validation cohort. Predictive features were selected using maximum relevance minimum redundancy (mRMR) and least absolute shrinkage and selection operator (LASSO) algorithms. Seven machine learning models—logistic regression, random forest, support vector machine (SVM), extreme gradient boosting (XGBoost), adaptive boosting (AdaBoost), light gradient boosting machine (LightGBM), and category boosting (CatBoost)—were developed and evaluated using receiver operating characteristic curves, calibration plots, and decision curve analysis (DCA). Feature importance was assessed using Shapley Additive Explanations (SHAP). A web-based nomogram was constructed for clinical application. Result: STAS was present in 44.76% of the training set and 50.83% of the validation cohort. Seven predictors were selected to construct the predictive models. The XGBoost model demonstrated superior performance with an AUC of 0.889 (95% CI, 0.852–0.926) in training and 0.856 (95% CI, 0.789–0.928) in validation. The calibration curves in training and validation set exhibited good agreement between the predictions and actual observations. The Decision Curve Analyses (DCA) provide significant clinical utility. SHAP analysis identified the most important predictors for STAS as CEA, vascular convergence, proGRP, age, AFP, smoking history, and CTR. Conclusion: The XGBoost model provides robust preoperative prediction of STAS and may assist clinicians in optimizing surgical strategies for patients with stage I lung adenocarcinoma.

Keywords: spread through air spaces, Lung Adenocarcinoma, machine learning, Solid andpart solid component, Surgical strategy

Received: 09 Aug 2025; Accepted: 17 Oct 2025.

Copyright: © 2025 Ren, liu, kai, Xu, Wang, Wu, You, Hu, Wang and Yusheng. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Xiaolin Wang, 18051063909@yzu.edu.cn
Shu Yusheng, 18051061999@yzu.edu.cn

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.