AUTHOR=Liu Zhangcheng , Zhou Wenjun , Dong Pan , Liu Jingyan , Luo Li , Luo Yu , Su Shuai , Sankoh Santigie Junior , Wang Yong , Liu Linhai , Zhang Yang , Qiu Shilin , Jiang Lincen , Han Kun , Zhang Jindong , He Jiang , Wang Delin TITLE=Interpretable machine learning models based on multi-dimensional fusion data for predicting positive surgical margins in robot-assisted radical prostatectomy: a retrospective study JOURNAL=Frontiers in Oncology VOLUME=Volume 15 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2025.1661695 DOI=10.3389/fonc.2025.1661695 ISSN=2234-943X ABSTRACT=ObjectiveThis study aimed to develop and validate interpretable machine learning (ML) models based on multi-dimensional fusion data for predicting positive surgical margins (PSM) in robot-assisted radical prostatectomy (RARP).MethodsPatients who underwent RARP at our institution between January 2016 and July 2025 were enrolled. Demographic, clinical, biopsy pathology data, and MRI-derived anatomical features (measured using ITK-SNAP on axial, sagittal, and coronal planes) were collected. Feature selection was performed using intraobserver and interobserver correlation coefficients (ICCs), low-variance filtering, univariable logistic regression, Spearman’s correlation analysis, the least absolute shrinkage and selection operator (LASSO) algorithm, and the Boruta algorithm. Six ML models were constructed, with performance evaluated using area under the curve (AUC), calibration curves, and decision curve analyses (DCA) to identify the optimal model. Five-fold and ten-fold cross-validation were used to assess the optimal model’s generalizability, and its interpretability was evaluated via Shapley Additive exPlanations (SHAP) analysis.ResultsA total of 347 patients were included, comprising a training set (n=193, January 2016–December 2024), validation set (n=84, January 2016–December 2024), and test set (n=70, January 2025–July 2025). From 164 initial features, 7 key features were retained through a four-step screening. The Random Forest (RF) model outperformed other models, achieving AUCs of 0.99 (95% CI: 0.97–1.00) in the training set, 0.88 (95% CI: 0.80–0.95) in the validation set, and 0.97 (95% CI: 0.94–1.00) in the test set. Calibration curve and decision curve analyses confirmed its strong clinical utility. Five-fold cross-validation for the RF model showed fold-specific AUCs of 0.82–0.92, with a mean AUC of 0.87 (95% CI: 0.84–0.90). Ten-fold cross-validation showed fold-specific AUCs of 0.80–0.99, with a mean AUC of 0.88 (95% CI: 0.83–0.93). SHAP analysis revealed five novel spatial anatomical features (such as Sagittal plane-posterior spatial anatomical structure index, Coronal plane-Left anatomical structure interval) were negatively associated with PSM risk, while the number of positive biopsy cores and clinical tumor stage were positively associations.ConclusionsMulti-dimensional fusion data combined with ML models improves PSM prediction accuracy in RARP. The RF model, with excellent performance and interpretability, shows promise for preoperative PSM risk stratification, facilitates optimized clinical decision-making, and supports personalized treatment discussions during preoperative planning, but requires prospective and external validation before clinical implementation.