Integration of MRI radiomics features and clinical data for predicting neurological recovery after thoracic spinal stenosis surgery: a machine learning model

Zheng, Bin; Zhu, Zhenqi; Yu, Panfeng; Liang, Yan; Liu, Haiying

doi:10.3389/fmed.2025.1633633

ORIGINAL RESEARCH article

Front. Med., 22 October 2025

Sec. Nuclear Medicine

Volume 12 - 2025 | https://doi.org/10.3389/fmed.2025.1633633

This article is part of the Research TopicRecent developments in artificial intelligence and radiomicsView all 10 articles

Integration of MRI radiomics features and clinical data for predicting neurological recovery after thoracic spinal stenosis surgery: a machine learning model

Bin Zheng^†

Zhenqi Zhu^†

Panfeng Yu

Yan Liang

Haiying Liu^*

Spine Surgery, Peking University People’s Hospital, Beijing, China

Background: Thoracic spinal stenosis (TSS) is a rare yet debilitating condition, often requiring surgical decompression. Prognostic assessments traditionally rely on single clinical or imaging features, limiting prediction accuracy. This study explores whether radiomics-based models enhance outcome prediction in TSS.

Methods: We retrospectively enrolled 106 surgically treated TSS patients (2012–2022), collecting clinical data and T2 axial MRI scans. Radiomics features were extracted from the most stenotic level, followed by rigorous feature selection (ICC > 0.9, U-test, Spearman, mRMR, and LASSO). Six machine learning classifiers were trained using radiomics and/or clinical data. Model performance was evaluated using AUC on an independent test set.

Results: Radiomics models outperformed clinical models (SVM AUC: 0.824 vs. 0.731). The combined radiomics–clinical model achieved the highest test-set AUC of 0.867, offering improved sensitivity and specificity.

Conclusion: In this preliminary exploratory study, integrating MRI radiomics with clinical data appeared to improve prediction of neurological recovery in TSS. These findings suggest that radiomics may enable objective, high-dimensional assessment of spinal cord pathology and potentially support individualized surgical decision-making, although further validation in larger, multicenter prospective cohorts is required.

Introduction

Thoracic spinal stenosis (TSS) is a relatively rare cause of spinal cord compression, often resulting from ossification of the ligamentum flavum or the posterior longitudinal ligament in the thoracic spine (1). It frequently leads to thoracic spinal cord dysfunction, and severe cases require surgical decompression (1). Many clinical studies focus on predicting outcomes for TSS or thoracic spinal cord lesions, but most rely on single-factor assessments with limited predictive dimensions (2). With the widespread use of MRI, research attention shifts to T2 intramedullary high signal intensity (ISI) and its quantitative assessment. Kim et al. validate a similar conclusion using the signal intensity ratio (SIR): a lower SIR correlates with a higher postoperative JOA recovery rate, and the preoperative JOA score itself also acts as a positive prognostic indicator (3). Hitchon reports Increased signal intensity on T2-weighted MRI images correlated with lower Frankel and JOA scores compared to those without (4).

These studies establish the foundation for combined imaging–clinical assessments. However, feature dimensionality remains low, largely relying on visually measurable morphological parameters (such as spinal canal diameter or the number of compressed segments) or a single grayscale index. This approach fails to capture the potential textural heterogeneity in the lesion. Additionally, generalizability remains limited—most analyses still use traditional single-factor or multivariate regression methods, lacking comprehensive machine learning frameworks. Nevertheless, previous cutting-edge research in spinal disorders demonstrates radiomics’ potential (5–9). For example, one multicenter study combines MRI radiomics and deep learning features to predict postoperative upper-limb muscle strength recovery in spinal cord injury patients, reporting an AUC near 0.89 on the test set (10).

Researchers therefore need high-throughput radiomic features and multi-algorithm machine learning to comprehensively quantify the most stenotic level of the thoracic spinal canal, increasing the accuracy and clinical utility of outcome predictions on a larger scale. This study addresses that gap by using radiomic features from T2 axial MRI to build an integrated prediction model. Given the single-center and retrospective design, our work should be regarded as a preliminary exploratory analysis, providing early evidence to support individualized risk stratification and surgical decision-making in thoracic spinal stenosis.

Methods

Study population

From January 2012 to April 2022, 106 patients (49 men and 57 women) undergo surgical treatment at our hospital. Inclusion criteria are: (a) a clinical diagnosis of thoracic spinal canal stenosis with surgery led by a senior orthopedic surgeon; (b) availability of preoperative MRI; (c) high-quality images without motion artifacts; and (d) preoperative and long-term (≥3 years) follow-up modified Japanese Orthopedic Association (mJOA) scores. (e) Standardized posterior thoracic laminectomy with instrumentation is performed by a single senior spine surgeon. Exclusion criteria include (a) a history of thoracic surgery and (b) a history of other diseases (spinal cord tumor, multiple sclerosis, spinal cord sclerosis, spinal cord injury, or motor neuron disease). We collect clinical data on age, sex, and duration of symptoms. We assess neurological impairment using the JOA. Participants divide into a poor-outcome group (postoperative JOA < 16) and a good-outcome group (postoperative JOA ≥ 16), because a postoperative JOA under 16 still indicates severe residual deficits (11, 12).

Extraction of MRI parameters

T2-weighted intramedullary high signal intensity (ISI) usually reflects intramedullary abnormalities from spinal cord compression. We evaluate it both qualitatively and quantitatively. ISI severity is classified into three levels: 0 for no signal change, 1 for mild and fuzzy high signal, and 2 for obvious and easily discernible bright signal (13, 14). In this study, we group ISI presence or absence into two categories due to sample-size considerations.

We use the spinal cord compression ratio to quantify how flattened the spinal cord appears in the compressed segment. The standard definition is: Spinal cord compression ratio = Minimum sagittal (Anterior–Posterior) diameter of the spinal cord at the compressed segment/Maximum transverse (Left to Right) diameter. A smaller ratio indicates a more flattened cord.

Image preprocessing

All patients undergo MRI on a 3 T scanner in the head-to-supine position. We apply a standardized MRI preprocessing pipeline to reduce inter-image variability: (1) Resample images to ensure consistent resolution (2). Pre-crop the images around the spinal cord centerline to maintain uniform dimensions (3). Normalize image intensities to keep identical tissue types at consistent intensities. SCT (version 4.0.0)¹ is applied for above process.

ROI segmentation

On axial T2 images, we identify the most severely stenotic level, selecting that slice and adjacent slices as the region of interest (ROI). Because the intramedullary lesion (ISI) area is often small with unclear boundaries, we choose the entire compressed spinal cord cross-section as the ROI. Under the supervision of a senior spine surgeon, two independent spine surgeons verify the level. We use the intraclass correlation coefficient (ICC) to assess intra- and inter-observer reliability. Initially, one investigator delineates the ROI. Another investigator with over 10 years of neurosurgical experience then randomly selects 30 cases to independently re-delineate, both investigators remaining blinded to each other’s results. Using these 30 cases, we calculate ICC to measure consistency. We retain only radiomic features with ICC above 0.9 in both datasets for further analysis.

Radiomics feature extraction

We carry out feature extraction using the Pyradiomics module.² We enhance the range of derived images using filters such as the Laplacian of Gaussian and wavelets. All radiomics features fall into seven categories: shape-based features, first-order features, gray-level dependence matrix (GLDM) features, gray-level size zone matrix (GLSZM) features, neighboring gray-tone difference matrix (NGTDM) features, gray-level run-length matrix (GLRLM) features, and gray-level co-occurrence matrix (GLCM) features. The radiomics features are uploaded in Supplementary material 1.

Feature selection

We use a rigorous approach to identify features most pertinent to thoracic spinal cord injury. First, a U-test (p < 0.05) pinpoints features with significant differences between the spinal cord injury group and the spinal cord concussion group. We exclude features with ICC under 0.9 at this stage. This strategy trims the number of features while preserving predictive power.

To address multicollinearity, we perform Spearman correlation analysis, examining inter-feature correlations. We label feature pairs with a correlation coefficient ≥0.9 or ≤ − 0.9 as strongly correlated and retain only the feature with superior diagnostic performance. Next, we use the minimum redundancy maximum relevance (mRMR) method to select the top 20 most important features. Finally, we apply the least absolute shrinkage and selection operator (LASSO) logistic regression to refine the feature set, imposing a penalty coefficient during variable selection and arriving at a more robust subset.

Model construction

We use multiple machine learning algorithms (Random Forest, Bayesian, Neural Network, Decision Tree, Generalized Linear Model, and Support Vector Machine) on the selected radiomics features, employing SMOTE to balance the classes. To evaluate each classifier’s performance, we generate receiver operating characteristic (ROC) curves and calculate the area under the curve (AUC).

Statistical analysis

We perform all statistical analyses using Python-based libraries (NumPy, Pandas, and SciPy, etc.). We use the AUC to measure predictive model performance. We employ DeLong’s test for comparing AUCs among different models to evaluate statistical differences in performance metrics. Analysis code and scripts are uploaded in Supplementary materials 2–4.

Results

The study flowgram is shown in Figure 1. We include 106 patients with thoracic spinal canal stenosis in this study, dividing them into a Good outcome group (63 patients) and a Poor outcome group (43 patients). Their mean ages are 48.44 ± 10.39 years (Good) and 46.14 ± 10.89 years (Poor), with no statistically significant difference (p = 0.137) (Table 1). The sex ratio is similar in both groups, and the average follow-up durations are 45.17 ± 10.10 months and 43.32 ± 11.90 months, respectively, with no significant difference. The Poor group has a significantly longer symptom duration than the Good group (18.47 ± 6.70 vs. 13.84 ± 3.76 months, p < 0.001), suggesting that persistent symptoms may correlate with worse outcomes. On MRI, the Poor group shows a higher incidence of intramedullary high signal (ISI) (p = 0.04). Regarding preoperative neurological function, the Poor group’s baseline JOA score is distinctly lower than that of the Good group (6.88 ± 1.12 vs. 8.13 ± 1.67, p < 0.001). The Poor group also has a lower spinal cord compression ratio (0.1526 ± 0.0639 vs. 0.1951 ± 0.0621, p < 0.001), indicating more severe cord compression. At final follow-up, JOA scores differ significantly as well (11.38 ± 2.23 vs. 16.56 ± 0.50, p < 0.001).

Figure 1

A flowchart visualizing four stages of a data analysis process: 1. ROI segmentation with MRI and 3D renderings. 2. Feature extraction shown with a grid and bar graph. 3. Feature selection using violin plots, line charts, and bar charts. 4. Model construction illustrated by ROC curves and a nomogram.

Figure 1. Workflow of radiomics analysis and model construction. Step 1: Region of interest (ROI) segmentation of the most stenotic thoracic spinal cord level on axial T2-weighted MRI. Step 2: Radiomics feature extraction from the segmented ROI. Step 3: Feature selection through reproducibility testing, statistical filtering, and LASSO regression. Step 4: Model construction using machine learning algorithms, followed by performance evaluation with ROC curves and nomogram visualization.

Table 1

Table 1. Patients’ demographics.

We randomly assign all patients to a training set (85 cases) or a test set (21 cases) at about a 4:1 ratio. Baseline demographics and clinical characteristics do not significantly differ between sets (Table 2). Both sets show comparable clinical distributions and prognoses, meeting model development and validation needs.

Table 2

Table 2. Patients’ demographics in train set and test set.

Clinical model construction

JOA baseline, High intensity signal, duration and compression rates are applied in clinical model construction. The cross validation is shown in Figure 2. Table 3 summarizes different machine learning algorithm. The best clinical model achieves an AUC of 0.731 on the test set (SVM), with an accuracy of 66.7%, a sensitivity of 62.5%, and a specificity of 69.2%. Figure 3 shows the ROC curves and AUCs for each machine learning algorithm in the clinical model for the test set.

Figure 2

Box plot comparing the AUC percentages of different machine learning models: SVM, KNN, Random Forest, Extra Trees, XGBoost, and LightGBM. The plot shows variations in AUC performances, with Random Forest having the highest range and KNN the lowest.

Figure 2. Cross-validation performance of the clinical model.

Table 3

Table 3. Comparison of machine learning performance of clinical model.

Figure 3

ROC curve comparing model performance for SVM, KNN, RandomForest, ExtraTrees, XGBoost, and LightGBM. Sensitivity is plotted against 1-Specificity. SVM and LightGBM have the highest AUC of 0.731, while RandomForest has the lowest at 0.615.

Figure 3. Receiver operating characteristic (ROC) curves of different machine learning classifiers based on clinical features in the independent test set. Models included Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest, Extra Trees, XGBoost, and LightGBM.

LASSO regression results

Radiomics feature selection and radiomics signature building

We extract a total of 1,198 radiomics features from thoracic stenosis MRI data, including 234 first-order features, 286 GLCM features, 182 GLDM features, 208 GLRLM features, 208 GLSZM features, 65 NGTDM features, and 14 shape features (shown in Figure 4). After validation, radiomics feature reproducibility is satisfactory. We use the LASSO algorithm in the training set to find the optimal regularization weight (λ = 0.0126), selecting 10 radiomics features that predict outcomes in thoracic spinal canal stenosis, shown in Figure 5. Figure 6 shows the coefficient distribution for these features. The radiomics score uses this formula:

Figure 4

Pie chart A shows the distribution of various metrics: 'glrlm' and 'glszm' at 17.4% each, 'gldm' at 15.2%, 'glcm' at 23.9%, 'firstorder' at 19.5%, 'ngtdm' at 5.4%, and 'shape' at 1.2%. Violin plot B displays data distribution for seven groups labeled 'firstorder', 'glcm', 'gldm', 'glrlm', 'glszm', 'ngtdm', and 'shape', showing density and distribution as dots with range on the y-axis marked as p-value.

Figure 4. Radiomics feature extraction results. (A) Distribution of extracted features across seven categories (first-order, GLCM, GLDM, GLRLM, GLSZM, NGTDM, and shape). (B) Violin plots of p-values for different feature categories following univariate filtering, showing significant feature diversity across groups.

Figure 5

Panel A shows a plot with lambda on the x-axis and mean squared error (MSE) on the y-axis. Red dots represent data points, with vertical blue lines indicating error bars. A vertical dashed line marks a specific lambda value. Panel B displays a plot with lambda on the x-axis and coefficients on the y-axis, containing multiple colored lines representing different coefficients. The same lambda value is highlighted by a vertical dashed line.

Figure 5. Radiomics feature selection using least absolute shrinkage and selection operator (LASSO) regression. (A) Ten-fold cross-validation plot used to determine the optimal λ. (B) Coefficient profiles of radiomics features, with 10 non-zero features retained for model construction.

Figure 6

Bar chart depicting feature coefficients. The x-axis represents coefficients from negative 0.15 to positive 0.15. The y-axis lists features such as

Figure 6. Weights of selected radiomics features after LASSO regression. Bar plots show the relative contributions of the 10 retained features to the final radiomics signature.

label = 0.3090601683388831 + +0.127998 * original_firstorder_Kurtosis −0.077429 * original_firstorder_Minimum +0.114108 * original_firstorder_RootMeanSquared −0.047303 * original_glcm_Imc2–0.063017 * original_glcm_SumEntropy +0.064179 * original_glrlm_LongRunLowGrayLevelEmphasis −0.059840 * original_glrlm_RunEntropy −0.031554 * original_glrlm_ShortRunEmphasis −0.016628 * original_glszm_GrayLevelVariance +0.008884 * original_glszm_ZoneVariance +0.053577 * original_ngtdm_Busyness −0.029125 * original_shape_Sphericity

Model construction

We build models using various machine learning algorithms based on radiomics features alone, then compare their performance (Table 4). The cross validation result is shown in Figure 7. Among radiomics-based models, SVM demonstrates the best test-set performance, with an AUC of 0.824, an accuracy of 78.1%, a sensitivity of 73.3%, and a specificity of 82.4%. Figure 8 shows the radiomics model’s ROC curves and AUCs in the training set (A) and test set (B).

Table 4

Table 4. Comparison of machine learning performance of radiomics model.

Figure 7

Box plot comparing AUC percentages for different models: SVM, KNN, RandomForest, ExtraTrees, XGBoost, and LightGBM. SVM shows a median around 0.7 with an outlier near 0.8. LightGBM has the widest range and the highest median value.

Figure 7. Cross-validation performance of radiomics-based models. ROC curves demonstrate performance of different machine learning classifiers in the training set using radiomics features only.

Figure 8

Two ROC curves compare model performance. Chart A shows six models with XGBoost performing best (AUC: 0.997). Chart B also features six models, with SVM having the highest AUC at 0.824. Both axes show sensitivity versus 1-specificity.

Figure 8. Performance of radiomics-based models. (A) ROC curves in the training cohort. (B) ROC curves in the independent test cohort. The SVM-based radiomics model achieved the best predictive performance (test-set AUC = 0.824).

Radiomics-clinical model

Figure 9 compares the ROC curves of the clinical model, the radiomics model, and the combined model in both the training set (A) and test set (B). Each model uses its best-performing algorithm, and the combined model merges them into a nomogram. Table 5 summarizes the clinical, radiomics, and combined models. In the training set, the radiomics model yields the highest AUC and significantly exceeds the clinical model, while the combined model reaches an AUC of 0.872. In the test set, the combined model’s ROC curve shows an AUC of 0.867, representing the best performance overall. Figure 10 shows Calibration curves between three models in train set (A) and test set (B). And Figure 11 shows DCA curves between three models in train set (A) and test set (B).

Figure 9

Two ROC curve graphs labeled A and B display the performance of three models: Clinic Signature (pink), Rad Signature (blue dotted), and Nomogram (cyan). Panel A shows Clinic Signature with an AUC of 0.764, Rad Signature with 0.912, and Nomogram with 0.872. Panel B shows Clinic Signature with an AUC of 0.731, Rad Signature with 0.852, and Nomogram with 0.867. Axes are sensitivity versus 1-specificity, with a diagonal line indicating random chance.

Figure 9. Comparison of clinical, radiomics, and combined (nomogram) models. (A) ROC curves in the training set. (B) ROC curves in the test set. The combined model integrating radiomics and clinical features achieved the best overall predictive performance (test-set AUC = 0.867).

Table 5

Table 5. Comparative performance of clinical, radiomics, and combined models.

Figure 10

Two calibration plots labeled A and B show the fraction of positives against mean predicted probability. Both plots include three lines representing Clinic Signature (blue), Rad Signature (orange), and Nomogram (green) compared to a dotted line for perfect calibration. The plots display variations in prediction accuracy across models, with axes marked from zero to one.

Figure 10. Calibration curves of the clinical model, radiomics model, and combined nomogram model. (A) Training set. (B) Test set. The combined model demonstrated the best agreement between predicted and observed outcomes.

Figure 11

Two graphs labeled A and B showing decision curve analysis (DCA) for models. Each graph plots net benefit against threshold probability. Both feature lines for Clinic Signature (blue), Rad Signature (orange), Nomogram (green), Treat all (solid black), and Treat none (dotted black). Lines illustrate performance across various threshold probabilities, with fluctuations in net benefit for different signatures.

Figure 11. Decision curve analysis (DCA) of the clinical model, radiomics model, and combined nomogram model. (A) Training set. (B) Test set. The nomogram consistently provides a higher net clinical benefit across a wider range of threshold probabilities compared with models using clinical or radiomics features alone.

Figure 12 presents the nomogram based on radiomics plus clinical features, integrating both elements to predict individual surgical outcomes. The above ROC analysis supports its effectiveness (Table 6).

Figure 12

A series of aligned scales depicting different metrics. The top scale shows

Figure 12. Nomogram integrating radiomics signature and clinical signature for predicting postoperative neurological recovery in thoracic spinal stenosis. The total points are calculated by summing the scores for clinical and radiomics predictors, which correspond to the estimated probability of poor outcome.

Table 6

Table 6. Delong test of three models.

Discussion

This study constructs multiple machine learning models using T2 axial MRI radiomics features and clinical variables to predict neurological recovery after surgery in thoracic spinal stenosis. The combined model (radiomics + clinical) provides the best predictive power, achieving an AUC of 0.867 in the test set, surpassing models that include only clinical or only imaging data. This finding indicates that incorporating high-dimensional MRI quantitative features with patient clinical information greatly enhances the ability to discriminate between good and poor postoperative neurological recovery, outperforming traditional empirical assessments.

In recent years, multiple studies in fields such as cervical spine pathologies or spinal cord injuries verify that radiomics holds promise for outcome prediction, treatment evaluation, and individualized decision-making (10, 15–17). Consistent with those findings, our study shows that radiomics-based modeling outperforms models relying solely on clinical factors, and that merging radiomics and clinical data further boosts the predictive capacity for postoperative neurological outcomes.

Clinically, TSS prognosis usually depends on surgeon experience and a few specific factors—such as preoperative symptom severity or MRI findings—but these single-factor predictions have limited accuracy (2). For instance, T2 intramedullary high signal is often regarded as a marker of severe cord damage and an indicator for poor outcome, but its predictive power varies across studies. Kozaki and Yukawa report higher intensity is associated with worse outcomes (18, 19). But traditional MRI qualitative indicators fail to capture the complete complexity of lesion properties, leading to suboptimal preoperative risk stratification.

Our findings reveal that clinical factors alone (e.g., symptom duration, preoperative JOA score) offer limited predictive accuracy for TSS outcomes, whereas high-throughput radiomic features from MRI markedly enhance model discrimination. Radiomics extracts a multitude of objective texture, shape, and grayscale distribution features from standard MRI, capturing finer lesion details and spinal cord heterogeneity that are imperceptible to the naked eye. These high-dimensional, quantitative variables characterize intramedullary changes and cord deformation more comprehensively than single metrics like T2 high signal presence or maximum compression ratio. We observe that a radiomics-based model raises the AUC to around 0.74 or higher, and that incorporating clinical data further boosts performance to 0.867, significantly surpassing any single-factor approach. This result implies that a multimodal model can detect the complex combination of factors influencing outcomes, thereby providing better predictive power than existing methods.

Radiomics also shows clinical potential by offering objective spinal cord injury assessment and individualized estimates of surgical benefit. Subtle imaging differences often reflect various pathological processes, such as intramedullary degeneration, inflammatory edema, microhemorrhages, or local blood supply changes, which single imaging signs or subjective observations frequently miss. By modeling high-dimensional radiomics, clinicians can better quantify the interplay among these pathological factors, identify high-risk patients preoperatively, and optimize surgical timing and approach.

Moreover, radiomics easily integrates with artificial intelligence algorithms, allowing for the creation of comprehensive decision-support systems that merge imaging, clinical characteristics, and surgical parameters. Compared to traditional regression models, machine learning (e.g., random forests, SVMs, and neural networks) excels at handling complex, nonlinear data, enabling more precise, individualized prognosis predictions for TSS patients. This is especially valuable for a patient population prone to wide variability in postoperative functional recovery and in need of timely interventions.

Compared with existing prediction methods, our combined radiomics-based model offers multiple advantages and strong clinical feasibility. First, radiomics analysis objectively extracts numerous MRI features, reducing subjective bias and capturing subtle imaging details relevant to spinal canal morphology, spinal cord compression, and signal heterogeneity. Second, machine learning algorithms incorporate this multidimensional information and uncover nonlinear relationships between imaging biomarkers and clinical data, improving predictive accuracy. Our findings confirm that a multi-factor model outperforms any single-factor approach, highlighting the potential of statistical learning in complex clinical prediction tasks.

Nonetheless, this study faces certain limitations:(1) Study Design: This is a single-center retrospective study with a relatively small sample size, which may limit model robustness and generalizability. Larger datasets from multiple centers and regions would strengthen external validation. Prospective, multicenter designs also help control confounders and further validate clinical applicability. (2) ROI Segmentation: We manually delineate the lesion region, which introduces observer subjectivity. Although it ensures some accuracy, operator variability still exists. Future studies may adopt semi-automated or fully automated computer-assisted segmentation tools to reduce manual bias.

Conclusion

This preliminary study suggests that integrating T2 axial MRI radiomics with clinical variables via machine learning may enhance the prediction of postoperative neurological recovery in thoracic spinal stenosis. While promising, these findings remain exploratory and require external validation in larger, prospective, multicenter studies.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.

Ethics statement

The studies involving humans were approved by the Peking University People’s Hospital. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

BZ: Conceptualization, Writing – original draft, Writing – review & editing. ZZ: Methodology, Writing – original draft, Writing – review & editing. PY: Methodology, Writing – original draft, Writing – review & editing. YL: Resources, Visualization, Writing – original draft, Writing – review & editing. HL: Supervision, Writing – original draft, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This study is funded by Horizontal Project of Peking University People’s Hospital (Grant number 2022-Z-09), Major Health Special Project of the Ministry of Finance of China (Grant number 2127000432), Major Health Special Project of the Ministry of Finance of China (Grant number 2127000349) and the fund of Peking University People’s Hospital (2023HQ05).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that no Gen AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmed.2025.1633633/full#supplementary-material.

Footnotes

1. ^https://github.com/neuropoly/spinalcordtoolbox

2. ^https://github.com/Radiomics/pyradiomics

References

1. Chen, G , Fan, T , Yang, X , Sun, C , Fan, D , and Chen, Z . The prevalence and clinical characteristics of thoracic spinal stenosis: a systematic review. Eur Spine J. (2020) 29:2164–72. doi: 10.1007/s00586-020-06520-6

PubMed Abstract | Crossref Full Text | Google Scholar

2. Zhang, J , Wang, L , Li, J , Yang, P , and Shen, Y . Predictors of surgical outcome in thoracic ossification of the ligamentum flavum: focusing on the quantitative signal intensity. Sci Rep. (2016) 6:23019. doi: 10.1038/srep23019

PubMed Abstract | Crossref Full Text | Google Scholar

3. Kim, TH , Ha, Y , Shin, JJ , Cho, YE , Lee, JH , and Cho, WH . Signal intensity ratio on magnetic resonance imaging as a prognostic factor in patients with cervical compressive myelopathy. Medicine. (2016) 95:e4649. doi: 10.1097/MD.0000000000004649

PubMed Abstract | Crossref Full Text | Google Scholar

4. Hitchon, PW , Abode-Iyamah, K , Dahdaleh, NS , Grossbach, AJ , el Tecle, NE , Noeller, J, et al. Risk factors and outcomes in thoracic stenosis with myelopathy: a single center experience. Clin Neurol Neurosurg. (2016) 147:84–9. doi: 10.1016/j.clineuro.2016.05.029

PubMed Abstract | Crossref Full Text | Google Scholar

5. Alsoof, D , McDonald, CL , Durand, WM , Diebo, BG , Kuris, EO , and Daniels, AH . Radiomics in spine surgery. Int J Spine Surg. (2023) 17:S57–s64. doi: 10.14444/8501

Crossref Full Text | Google Scholar

6. Cheng, L , Cai, F , Xu, M , Liu, P , Liao, J , and Zong, S . A diagnostic approach integrated multimodal radiomics with machine learning models based on lumbar spine CT and X-ray for osteoporosis. J Bone Miner Metab. (2023) 41:877–89. doi: 10.1007/s00774-023-01469-0

PubMed Abstract | Crossref Full Text | Google Scholar

7. Gitto, S , Bologna, M , Corino, VDA , Emili, I , Albano, D , Messina, C, et al. Diffusion-weighted MRI radiomics of spine bone tumors: feature stability and machine learning-based classification performance. Radiol Med. (2022) 127:518–25. doi: 10.1007/s11547-022-01468-7

PubMed Abstract | Crossref Full Text | Google Scholar

8. Li, S , Yu, X , Shi, R , Zhu, B , Zhang, R , Kang, B, et al. MRI-based radiomics nomogram for differentiation of solitary metastasis and solitary primary tumor in the spine. BMC Med Imaging. (2023) 23:29. doi: 10.1186/s12880-023-00978-8

PubMed Abstract | Crossref Full Text | Google Scholar

9. Saravi, B , Zink, A , Ülkümen, S , Couillard-Despres, S , Wollborn, J , Lang, G, et al. Clinical and radiomics feature-based outcome analysis in lumbar disc herniation surgery. BMC Musculoskelet Disord. (2023) 24:791. doi: 10.1186/s12891-023-06911-y

PubMed Abstract | Crossref Full Text | Google Scholar

10. Lin, F , Wang, K , Lai, M , Wu, Y , Chen, C , Wang, Y, et al. Multicenter study on predicting postoperative upper limb muscle strength improvement in cervical spinal cord injury patients using radiomics and deep learning. Sci Rep. (2025) 15:5805. doi: 10.1038/s41598-024-72539-0

PubMed Abstract | Crossref Full Text | Google Scholar

11. Tetreault, LA , Kopjar, B , Vaccaro, A , Yoon, ST , Arnold, PM , Massicotte, EM, et al. A clinical prediction model to determine outcomes in patients with cervical spondylotic myelopathy undergoing surgical treatment: data from the prospective, multi-center AOSpine North America study. J Bone Joint Surg Am. (2013) 95:1659–66. doi: 10.2106/JBJS.L.01323

PubMed Abstract | Crossref Full Text | Google Scholar

12. Nouri, A , Tetreault, L , Côté, P , Zamorano, JJ , Dalzell, K , and Fehlings, MG . Does magnetic resonance imaging improve the predictive performance of a validated clinical prediction rule developed to evaluate surgical outcome in patients with degenerative cervical myelopathy? Spine. (2015) 40:1092–100. doi: 10.1097/BRS.0000000000000919

PubMed Abstract | Crossref Full Text | Google Scholar

13. Machino, M , Imagama, S , Ando, K , Kobayashi, K , Ito, K , Tsushima, M, et al. Image diagnostic classification of magnetic resonance T2 increased signal intensity in cervical spondylotic myelopathy: clinical evaluation using quantitative and objective assessment. Spine. (2018) 43:420–6. doi: 10.1097/BRS.0000000000002328

PubMed Abstract | Crossref Full Text | Google Scholar

14. Wei, L , Wei, Y , Tian, Y , Cao, P , and Yuan, W . Does three-grade classification of T2-weighted increased signal intensity reflect the severity of myelopathy and surgical outcomes in patients with cervical compressive myelopathy? A systematic review and meta-analysis. Neurosurg Rev. (2020) 43:967–76. doi: 10.1007/s10143-019-01106-3

PubMed Abstract | Crossref Full Text | Google Scholar

15. Zhang, MZ , Ou-Yang, HQ , Jiang, L , Wang, CJ , Liu, JF , Jin, D, et al. Optimal machine learning methods for radiomic prediction models: clinical application for preoperative T(2)*-weighted images of cervical spondylotic myelopathy. JOR Spine. (2021) 4:e1178. doi: 10.1002/jsp2.1178

PubMed Abstract | Crossref Full Text | Google Scholar

16. Zhang, MZ , Ou-Yang, HQ , Liu, JF , Jin, D , Wang, C-J , Ni, M, et al. Predicting postoperative recovery in cervical spondylotic myelopathy: construction and interpretation of T(2)(*)-weighted radiomic-based extra trees models. Eur Radiol. (2022) 32:3565–75. doi: 10.1007/s00330-021-08383-x

PubMed Abstract | Crossref Full Text | Google Scholar

17. Zhang, Z , Li, N , Ding, Y , and Cheng, H . An integrative nomogram based on MRI radiomics and clinical characteristics for prognosis prediction in cervical spinal cord injury. Eur Spine J. (2025) 34:1164–76. doi: 10.1007/s00586-024-08609-8

PubMed Abstract | Crossref Full Text | Google Scholar

18. Kozaki, T , Yukawa, Y , Hashizume, H , Iwasaki, H , Tsutsui, S , Takami, M, et al. Clinical and radiographic characteristics of increased signal intensity of the spinal cord at the vertebral body level in patients with cervical myelopathy. J Orthop Sci. (2023) 28:1240–5. doi: 10.1016/j.jos.2022.10.010

PubMed Abstract | Crossref Full Text | Google Scholar

19. Yukawa, Y , Kato, F , Yoshihara, H , Yanase, M , and Ito, K . MR T2 image classification in cervical compression myelopathy: predictor of surgical outcomes. Spine. (2007) 32:1675–8. doi: 10.1097/BRS.0b013e318074d62e

Crossref Full Text | Google Scholar

Keywords: thoracic spinal stenosis, MRI radiomics, machine learning, neurological recovery, predictive

Citation: Zheng B, Zhu Z, Yu P, Liang Y and Liu H (2025) Integration of MRI radiomics features and clinical data for predicting neurological recovery after thoracic spinal stenosis surgery: a machine learning model. Front. Med. 12:1633633. doi: 10.3389/fmed.2025.1633633

Received: 22 May 2025; Accepted: 22 September 2025;
Published: 22 October 2025.

Edited by:

Luca Urso, University of Ferrara, Italy

Reviewed by:

Yao-Wen Liang, National Yang Ming Chiao Tung University (Yangming Campus), Taiwan
Marc Ghanem, Stanford University, United States

Copyright © 2025 Zheng, Zhu, Yu, Liang and Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Haiying Liu, bGl1aGFpeWluZzE5NjRAMTYzLmNvbQ==

^†These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.